Today I am sharing how I use Power Query as a file audit tool to support a regular task I complete every week. The objective of this post is to create broader awareness of how great Power Query is to solve different types of problems, and to possibly get you thinking about other problems you have that could be solved with this tool.
Copying files between folders
My use case is this. I refresh multiple workbooks for a client each week prior to publishing them on a shared network folder. I use Power Update as the tool to refresh a folder containing some 40 Power Pivot workbooks on my PC. I then upload the files to the customer network in a temp folder prior to copying them across to the shared network folder. I need to use this 2 step process for various reasons that are not relevant to the topic of this post.
File Open Conflicts
When I copy the files across in batch from the temp folder to the shared network folder, sometimes users have one or more of the files on the network folder open and this causes a file copy conflict. When this happens, the individual files in question are not copied to the destination.
Needle in a Haystack Problem
So the problem is I now need to compare the list of files in my temp folder with the list of files on the network folder and find out which ones did not copy across. The problem is made much harder by the fact that the destination folder contains a lot of other files that are not subject to the weekly refresh. Can you see with my demo files below which files on the left are not in the list on the right? Of course you can do it, but it is much harder with 40 files, and besides there has to be a better way.
Power Query to the Rescue
After I was half way through the list the first time, I realised I had to find a better way to solve this recurring problem. It occurred to me that Power Query was a great tool for such repetitive work. I have recreated the process in a demo that I will explain below.
Create a Source File List Query
The first thing to do is open a new blank Excel workbook and create a query that loads up a list of the files from the source folder.
The query will look like this (below)
Not all the data that Power Query offers is needed – just the 2 columns indicated are needed for the comparison. So multi select the 2 columns to keep, right click, and then select “remove other columns”.
Then save the query but select “Close and Load to…” as shown below.
Select “Only Create Connection”.
Repeat the process for the Destination folder. This will leave 2 queries as shown below, but no data loaded yet – just connections to the folders.
Interested in Learning How to Do this Kind of Thing?
Our New Online Power Query Course just launched this year. You’ll learn Power Query from the best in the business, two Excel MVPs Ken Puls and Miguel Escobar. We’ve included 7 Hours of Self-Paced Video over 31 Courses, with lots of example files, and a Digital Badge powered by Acclaim.
CLICK HERE to claim your spot.
Compare the files in the 2 Queries
The next step is to use Power Query to compare these 2 file lists. Create a new query by referencing the Source query shown above.
Right click on source, and then select “reference”.
Rename the query “Missing Files”.
The next thing is to merge this new query with the second query (destination). Select the merge queries option as shown below
In the dialog, join the 2 tables using a left outer join – ie keep the files that are in the first list (source) that match with the second list (destination). This step has the effect of filtering out all the other files in the destination folder – these other files are not relevant to the task at hand.
There is now a new column in the query. Click on the expand button
Then deselect the Name column and deselect the “use original column” option as shown below. Click OK.
There is now a table that contains all the source files, the date of the source file and the date of the destination file
The next step is to add a new column that compares the 2 date columns. This is easy to do with a PQL “if” statement.
(Note: I don’t like using the terms “M” or “Power Query Formula Language”. The first is too short, the second is too long. So I now refer to the Power Query Language as “PQL”, pronounced Peequel, just like SQL seequel. If you agree that this is a good name change, please vote for this idea).
Note/remember that PQL is case sensitive. The syntax for the if statement is as follows
if [some column] = [some other column] then something else something different
The key words are if, then, else – always lower case.
It is easy to write this custom formula. First click “Add Column” from the menu, then type the formula in the wizard as shown below
Then filter this new column to remove the matches
Remove the last 3 columns – they are not needed any more.
Then click Close and Load, what is left is a list of files that haven’t been copied from the source to the destination.
Power Query Records the Process Steps
Most readers of this blog will already know this, but one of the many great things about Power Query is that it creates a repeatable process. The table shown above can be refreshed at any time (right click in the table, then select refresh) and the comparison will be done again on the current file contents.
Other solutions to this Problem
Of course there are many ways to solve this problem. I could write a batch script to do the copying and log the success of each file copy. I am sure there are many other ways. But there are very few ways that are in easy reach of the average user and that can be done via the UI quickly and easily as shown above.