skip to Main Content

Today I am sharing how I use Power Query as a file audit tool to support a regular task I complete every week.  The objective of this post is to create broader awareness of how great Power Query is to solve different types of problems, and to possibly get you thinking about other problems you have that could be solved with this tool.

Copying files between folders

My use case is this.  I refresh multiple workbooks for a client each week prior to publishing them on a shared network folder.  I use Power Update as the tool to refresh a folder containing some 40 Power Pivot workbooks on my PC.  I then upload the files to the customer network in a temp folder prior to copying them across to the shared network folder.  I need to use this 2 step process for various reasons that are not relevant to the topic of this post.

File Open Conflicts

When I copy the files across in batch from the temp folder to the shared network folder, sometimes users have one or more of the files on the network folder open and this causes a file copy conflict.  When this happens, the individual files in question are not copied to the destination.

Needle in a Haystack Problem

So the problem is I now need to compare the list of files in my temp folder with the list of files on the network folder and find out which ones did not copy across. The problem is made much harder by the fact that the destination folder contains a lot of other files that are not subject to the weekly refresh.  Can you see with my demo files below which files on the left are not in the list on the right?  Of course you can do it, but it is much harder with 40 files, and besides there has to be a better way.

image

Power Query to the Rescue

After I was half way through the list the first time, I realised I had to find a better way to solve this recurring problem.  It occurred to me that Power Query was a great tool for such repetitive work.  I have recreated the process in a demo that I will explain below.

Create a Source File List Query

The first thing to do is open a new blank Excel workbook and create a query that loads up a list of the files from the source folder.

image

The query will look like this (below)

image

Not all the data that Power Query offers is needed – just the 2 columns indicated are needed for the comparison.  So multi select the 2 columns to keep, right click, and then select “remove other columns”.

image

Then save the query but select “Close and Load to…” as shown below.

image

Select “Only Create Connection”.

image

Repeat the process for the Destination folder.  This will leave 2 queries as shown below, but no data loaded yet – just connections to the folders.

image


Interested in Learning How to Do this Kind of Thing?

power query

Our New Online Power Query Course just launched this year. You’ll learn Power Query from the best in the business, two Excel MVPs Ken Puls and Miguel Escobar. We’ve included 7 Hours of Self-Paced Video over 31 Courses, with lots of example files, and a Digital Badge powered by Acclaim.

CLICK HERE to claim your spot.


Compare the files in the 2 Queries

The next step is to use Power Query to compare these 2 file lists.  Create a new query by referencing the Source query shown above.

Right click on source, and then select “reference”.

image

Rename the query “Missing Files”.

The next thing is to merge this new query with the second query (destination).  Select the merge queries option as shown below

image

In the dialog, join the 2 tables using a left outer join – ie keep the files that are in the first list (source) that match with the second list (destination).  This step has the effect of filtering out all the other files in the destination folder – these other files are not relevant to the task at hand.

join

There is now a new column in the query.  Click on the expand button

image

Then deselect the Name column and deselect the “use original column” option as shown below.  Click OK.

image

There is now a table that contains all the source files, the date of the source file and the date of the destination file

image

The next step is to add a new column that compares the 2 date columns.  This is easy to do with a PQL “if” statement.

(Note: I don’t like using the terms “M” or “Power Query Formula Language”.  The first is too short, the second is too long. So I now refer to the Power Query Language as “PQL”, pronounced Peequel, just like SQL seequel.  If you agree that this is a good name change, please vote for this idea).

Note/remember that PQL is case sensitive.  The syntax for the if statement is as follows

if [some column] = [some other column] then something else something different

The key words are if, then, else – always lower case.

It is easy to write this custom formula. First click “Add Column” from the menu, then type the formula in the wizard as shown below

match

Then filter this new column to remove the matches

image

Remove the last 3 columns – they are not needed any more.

image

Then click Close and Load, what is left is a list of files that haven’t been copied from the source to the destination.

image

Power Query Records the Process Steps

Most readers of this blog will already know this, but one of the many great things about Power Query is that it creates a repeatable process.  The table shown above can be refreshed at any time (right click in the table, then select refresh) and the comparison will be done again on the current file contents.

image

Other solutions to this Problem

Of course there are many ways to solve this problem. I could write a batch script to do the copying and log the success of each file copy.  I am sure there are many other ways.  But there are very few ways that are in easy reach of the average user and that can be done via the UI quickly and easily as shown above.

Matt Allington

Matt Allington

Matt Allington is a Microsoft MVP specalising in Power Pivot, Power BI and Power Query Consulting and Training based in Sydney Australia. Visit Matt's blog here.

This Post Has 10 Comments
  1. 1. If you Make the merge on both the name + the date (by using ctrl), you can the save the date comparison steps.
    2. The left outer join may not be the best choice
    3. I will vote for PQL

  2. Alternatively, you could paste the following code into notepad, save as a .bat file and run as needed. Retitle the folders to suit your needs.

    ROBOCOPY C:\SourceFolder\ C:\DestinationFolder /l /ns /ndl /njs /njh /log:C:\LogFolder\MissingFiles.txt

    1. SQLRockstar,

      After you get the list of the files from the folder, expand the Attributes column to get the size and more information of the files.

      Regards

  3. Nice tip!
    I have used PQ as a tool to automate a lot of tedious Excel work in our organization the last year. Extracting data from different sources, transforming, fixing, and outputting for instance transactions ready to be loaded directly into our accounting system 🙂

    1. If you did a double join on the file name and date/time, then yes. There are many ways to solve these problems, and i quite often find incrementally better solutions after successfully executing one approach.

Leave a Reply

Your email address will not be published. Required fields are marked *