skip to Main Content

Guest Post by Miguel Escobar Twitter | Youtube | Blog | Website

Power Query Magic: The Ultimate and easiest way to consolidate multiple tables, sheets, text and/or csv files

Power Query Magic:  The Ultimate and easiest way to consolidate multiple tables, sheets, text and/or csv files
(Click for Full-Size Version)

At some point in the life of an Excel user, we have all faced a similar dillemma. How can I combine multiple sheets, tables, csv or txt files? (can I combine them all together??)

How we used to solve this scenario

Back in the day (before Power Query) we actually had some ways to do so but they were not so user-friendly and they relied heavily on coding or some tedious way of doing it. The most common ways were:

  1. Using SQL Statements to join multiple files
  2. Creating a VBA code that will do the job for me
  3. Going with the tedious way of combining the files manually (perhaps with Excel or Access)

But now we have an easier and optimized way of doing this..let’s find out how

Intermission: How Power Query changes the game

Power Query changeBefore we start, let me say this out loud – Power Query is the best tool that Microsoft has created for Excel in the last 5 years. It’s like having an user-friendly interface that creates some wicked cool ultra optimized VBA code for me without me needing to learn any VBA at all Sonrisa

The target audience that Power Query has is much broader than that of Power Pivot. Mainly because Power Query focuses on cleaning, enriching and preparing your data – which is something that most Excel users do on a daily basis – and because of its ease of use.

Don’t get me wrong. I love Power Pivot and always will, but in order for my Data Model to have the most optimal shape I’d go through the Power Query experience rather than the DAX experience for now.

Ok – had to get that off my chest. Let’s get back to our post.

Consolidating / Combining multiple files with Power Query

Optimal coding

So far I’ve already blogged here and here and created a few youtube videos explaining how to combine multiple tables and / or sheets from Excel files and Ken Puls also did his thing with the csv files.

Now its time for this to reach a new level, so I want you, the reader, to download this zip file that contains the needed files for you to test this out. I call this the Ultimate Combination.

 

Download the Ultimate Combination

You’ll download a zip file that has the file with the PQ Query ready to go and a folder with the sample files

  • Now, let me guide you through the process on how to use this!

Step 1: Extract the files and find the Query that does the magic

Let’s see what’s in that zip file:

Compressed folder

What you’ll get inside that zip file

Now that we know what’s in it, let’s unzip that and once you do that then we’re going to open the file with the name “Ultimate Combination.xlsx” that we saw on the image above.

Note: Before you open that file, please make sure that you’ve  installed Power Query and if you haven’t yet then you can click here to download Power Query.

Once you open that file, please navigate to your Power Query ribbon and then click on Workbook Queries. You’ll be able to see one query on the query pane to your right like this:

Find your Workbook queries

Step 2 (final step): Point it to your folder

Now that we found the query, you can right click on it and hit Edit so we can take a look at it.

Edit a PQ query

Now you’re going to see an error on the top of the query that would look like this:

Error for folder not found

and the reason behind this error is that it is actually pointing to a folder on my local machine. In order to fix this we’re going to have to change that folder somehow and here’s how:

Choose Folder

and then click on browse so you can navigate through a friendly “choose folder” window and select the folder that you unzipped before with the name of “Best Way to Combine”.

Now just let Power Query do all the magic and, when it finishes, navigate to the final step called “Expanded”. You’ll see a table that will look like this one:

Consolidated Table

(consolidated table)

And in this consolidated table you’ll first see the column of the name of the file, then the extension of that file and then, if it was an excel file, you’re going to get information about what object it was from that excel file like what sheet or what table (Sheet Name, Item Name & Kind). You can read more about this on my previous blog post here.

After those columns, you’re going to get all the columns that were found on your files except for the last column which is called Total Rows that shows the total number of rows that were consolidated from that specific file or object. If you continue scrolling down then you’re going to see the result of combining all your data.

You can right click and delete the columns that you don’t need or filter the data as desired and then hit Close & Load so you can load that table into Excel or Power Pivot directly.

Now we can celebrate

Celebration

Things to take in consideration

I’ve written the code so it can read txt files as delimited by a comma by default but you can change that by changing the code found “The Formula” step like this:Change how to read TXT files

This query is consolidating anything and everything, so if some files have some columns but the others don’t then we will import that column and leave blank values for the files that didn’t have that column  Columns not present

By reading the previous consideration, you might have guessed that the query is dynamically selecting all the columns available through all the files. This is an advanced topic and you can bet that we’ll cover it in our book (read at the bottom of this page for more info). To see the list of columns that we found from all the files you can look at the formula bar found on the MyList step

ColumnNameList

 

 

 

 

 

Here’s a cool feature, you can actually check the data that is about to be combined before it happens and see how many rows you’re about to combine. You can check this by going to the “Here we go” step and see this table for yourself

Audit before combining

On a final note, you can actually just take the bits of M code that you need and totally adjust it to your needs. This just proves the flexibility and ease of use that Power Query has and, remember, this solution works for both Excel 2010 or Excel 2013 (with Power Query installed, of course).


Interested in Learning How to Do this Kind of Thing?

power query

Our New Online Power Query Course just launched this year. You’ll learn Power Query from the best in the business, two Excel MVPs Ken Puls and Miguel Escobar. We’ve included 7 Hours of Self-Paced Video over 31 Courses, with lots of example files, and a Digital Badge powered by Acclaim.

CLICK HERE to claim your spot.


New Power Query Book coming out soon!

A book written by Ken Puls & myself. If you enjoyed the approach that Rob took with his book and how practical it is then you’re going to LOVE this one as well.

Be sure to check out Ken Puls’ blog and mine for some cool tricks with Power Query and things that you might expect to be covered in our book.

Also expect an upcoming blog post of mine in dissecting the M code written for this solution.

Power Query for Excel

Click on this image to pre-order the book!

Special thanks to Avi and Rob for this opportunity to post on this blog.

Avi’s Note: Miguel, thanks to you for your post. M is for (Data) Monkey is going to be my GO-TO book for Power Query as soon as it comes out!

This Post Has 76 Comments
  1. Great post and very useful. Thanks! I’m digging deeper into power query more and more, especially to feed powerpivot. It’s the tail that wags the dog!

  2. I have a point of contention with Power Query, but perhaps you can tell me of a way to fix the problem. I have found that if I ever want to Insert a step, or make any changes whatsoever to an existing query, it refuses to refresh the data model. I then have to disable the Load, and re-enable. When I disable the load, it wipes out the existing table in the model, which means I loose all my calc columns or measures that I’ve added to that table, not to mention the breaking of the relationships to other tables. I dread making any query changes, because it always breaks my models. Do you have any ideas on how to get around this??

    1. Hey Rita!

      Unfortunately, that is a known bug of the whole Power Query + Power Pivot experience and other bloggers have posted about it as well. It’s a current limitation because of the actual codebase that Excel uses but Microsoft has guaranteed that this has been fixed for the next big release of Office.

  3. Hi Rob, I have purchase both books published by you on PowerPivot. And I found it really useful. I am really eager to read this new book on power query and I have a request for you:

    Please share the supporting files with the book…I was little disappointed not finding anything for PP Alchemy.

    1. Hey Abhay!

      The actual writers of the Power Query book are Ken Puls & myself (Miguel Escobar). I’m sorry to hear about the PP Alchemy situation but I can guarantee you this about the book and why you’ll love it:
      – A technical book by humans for humans
      – it comes with Systematic way of learning (similar to the one Rob had with his first book)
      – Example and visual driven
      – Practical Utility (real-life scenarios)
      – Downloadable content (you can download the workbooks!)

  4. There’s no point in loading the same data more than once. If an Excel sheet includes a table and/or a named range, only the sheet data should be imported into PowerQuery.

    In your approach – maybe intentionally – data from file7/Feb, file8/Apr+May and file9/June will be duplicated, i.e. 257922 rows instead of 173648.

    1. Hey Frank!

      Sorry if the post caused any confusion. It is completely intentional and is meant to be something rather advanced in order to show how Power Query reads an Excel file. You can read more about it on this blog post
      https://thepoweruser.wordpress.com/2014/11/24/the-function-to-combine-multiple-workbooks-data/

      The reason behind this approach is that you might have multiple tables in just one sheet and perhaps you don’t want to load things directly from the spreadsheet. The good thing about Power Query is that you can always filter out the steps that you don’t need or the files or tables that you don’t want to combine.

      Edit PS: Glad that you found it!! it means that you read the whole thing and I appreciate your support! 🙂

  5. Keep up the good work Miguel, I’ll be in line for your book. As a seasoned VBA developer I am now avoiding VBA in preference for using tools like this. I’m selling it on the basis of transparency and robustness and a learning curve which can be negotiated by Excel power users without having to understand complex data transformations in VBA

    1. Thanks for the kind words, Anthony! A new post should be coming out soon. “Top 5 Ways Power Query Helps Excel Users”

  6. Hi Miguel, enjoyed your post and looking forward to the book. I see the latest version of Power Query (using Excel 2010) does not have any options to load straight to the Data Model. Any advice how to achieve this or can I access Power Query directly using Power Pivot?

    Regards

    Andy

    1. hey Andy! the thing with Excel 2010 is that the Data Model doesn’t exist (integrated with Excel). What I usually do is basically create the connection and then use that connection from within Power Pivot.

      Let me know if that helps!

      1. Miguel hi, Many thanks for your prompt response. I need a little hand holding as to how to achieve your suggestion. I want to load into Power Pivot a file with > 1 million rows, so cannot link the downloaded worksheet data to PPivot. Can I establish a link to Power Query that give me all the features of PQ within PPivot.

        Regards

        Andy

        1. i haven’t really tried this myself but teóricamente you could first load the connection and then, inside Power Pivot, go to existing connections and use that Connection.

          Let me know if that helps!

  7. How would Power Query handle the import of data from multiple tables from multiple databases? What is also required is the continuous update of that data so that the reports derived from such data is always up to date?

    1. There are so many ways to accomplish this! I can guarantee you that it is possible and basically the key is having all of the necessary information to access such data in a “parameter-like” table.

      Power Query follows the data refresh concept that you find in any connection found in Excel, so for continuous update of that data you’ll have to check the “Connection Properties” and customize it to your needs.

      Hope this helps!

  8. Great post. Thanks.

    I am working with Excel 2003 XML files (I think it’s called SpreadsheetML — it’s very different from current Excel 2010 XML file type!). That is the file format provided by an daily data retrieve from one of my vendors . I can open the XML fine in Excel 2010, but Power Query doesn’t like it.

    I get an error when I try to add a column with =excel.workbook([Content]).

    Any ideas?

    1. I haven’t had the chance to try it but I’m pretty sure that if you’re using an XML file then you’ll need to treat it as an XML file and use this function:
      Xml.Document([Content])

      instead of the Excel.Workbook one

      Hope this helps!

      1. Hi Miguel — Thanks!

        That got me a little further. There is a table within the XML called “Worksheet” – when I drill down into that Table, I find four un-named tables. There are four worksheets in the excel file when I load directly into excel.

        When I load the XML 2003 file into excel, there are four worksheets — four different daily database retrieves, so they have different columns.

        When I drill down into the tables further, I get and and headers. Can’t figure out if there is a way to translate this into a table.

        1. hey Hutch,

          I haven’t really tried anything related to xml yet but shoot me an email! if you can, attach the file to it with some dummy data and perhaps an image of what you’re looking for.

          Here’s my email: [email protected]

  9. Excellent Stuff Miguel this is awesome!!!!!!!! I have been working with PowerPIvot for a while and sometimes I struggle to massage data before importing it to my models. I am going to dive deep into Power Query so I will defenitely buy your book as soon as it comes out. Keep up the good work. Regards from Spain.

    1. Hola Jesus!

      Disculpa el retraso en mi respuesta. Por alguna razón no recibí la notificación de un nuevo comentario en esta entrada o tal vez quedo sepultada sobre otros correos.

      Gracias por tus comentarios para lo que necesites aquí estamos!

      Saludos desde Panamá,
      ME

  10. Miguel,

    In “The Formula” step you indicated the comma could be replaced by any character. I’ve been trying to figure out how to specify a tab character. I’ve tried the excel CHAR() function and VBA CHR() function but neither of those work.

      1. Miguel,

        Thanks. Just moments after I made the post, I stumbled across the answer in the Microsoft Power Query Formula Library Specification document. For those wondering “#(cr)” is for carriage return and “#(lf)” is line feed

  11. I have two queries (same columns but different data) to which I’ve added columns using formulas. I now want to append the two queries (with the added columns) into one large table, but the added columns with formulas aren’t being included. What can be done?

    Thanks so much for your help!

    1. Hey!

      This post covers a combination from files within a folder but you take the bits that you need from it and create the solution that you need. I wouldn’t know what to say as I haven’t seen the file or where does columns were added but it could be that you’re reading a table or a named range and those columns are outside that table or named range?

      I’m leaving this link as it might help you get more detailed information on this topic:
      http://powerquery.training/portfolio/append-data-from-files/

  12. Hi Miguel

    Great post, I’ve used the workbook in a few projects. Unfortunately I have a new project coming up that may pose a problem:

    The project requires the combination of 50 Excel tables which have *daily* dates as the column headers – spanning many years. Therefore I’d like to know if it possible to unpivot these into a single Date column before step “Extended”. I ask because there are potentially thousands of columns I am worried that Power Query will error out when getting to step “Expanded” and thereby prevent me from unpivoting at this point in the script before loading it.

    Thanks,
    Simon

    1. hey Simon,

      Yes, you can! you can customize this as you need and do a “unpivot other columns” on the table values that you need prior to expanding that column. That’s the true beauty of Power Query, you can add more transformation steps as needed wherever you need them to.

      The other way is to create a function, but I tend to not use functions that much.

      This link might help you to get more information about this pattern:
      http://powerquery.training/portfolio/append-data-from-files/

  13. Hey Miguel!

    This tool is great! I’ve got one problem however that I’m hoping you can help figure out:

    The files I’m merging have a carriage return (Alt + Enter) that was added in some column names. When this tool processes those columns, it puts quotations around them in the header name but does not pull in any of the data that existed in the rows of those columns (they’re just null). I’ve verified that there is data that exists, so I’m looking for a way to make it handle columns with hard returns.

    Thanks!

    1. Hey Justin!

      I’d try to load a sample file using just the From File feature of Power Query and see if it behaves the same. If it does, then I’d probably read that file as a .txt file and then do some transformation on it in order to clean it and more and then apply that same transformation for every file that you have so it can read the data from that column.

      Hope this helps!

      1. Thanks for getting back to me, Miguel.

        I did it just as importing from file as Excel and it works fine (with the exception that it truncates anything after the carriage return in the column header, which is fine); it pulls in all of the data related to that item.

        I was hoping to use this tool since I have data from 130 stores to combine into one master file.

        I’m thinking that the process breaks down when it tries to find the “text [carriage return] text” header that’s populated in the “MyList” and doesn’t get a match.

  14. Great post! Love the combination.

    I am having some trouble with network files though. When I have the files saved on my desktop, they combine no problem. But when I try to use files saved on the network, it just keeps showing the in progress circle and never finishes. I want to use the network file because the excel files I am accessing are updated manually every day or so and I want to be able to refresh all the information. Is there some way to make the query more network friendly? Or matter set the query to download the files onto the computer before extracting the information?

    Thanks!

    1. Hey Daniel,

      This behavior is commonly due to the privacy levels. You’ll need to check your datasources and set that location (the network one) as public so you can do this combination or you can just click on the “enable fast combination” option in the Power Query Options – it should be under the privacy section labeled as “ignore the privacy levels and potentially improve performance”. That should do the job.

      Hope this helps

      1. Thanks Miguel!

        I can now combine the files! Although it does still take a while. I think the query is pulling every row from the work sheet, including the blank ones. It will load to the data model, but I am having trouble loading to a worksheet.

        I know I can have Power Query delete the blank rows before I load to a worksheet, but it still takes a long time to download the files. The largest file has less than 100 rows. I tried using Table.firstN to select just the first 1000 rows (I figure one order of magnitude is enough wiggle room for extra rows in the future), but I haven’t had any success with it., although I am new to M language, so that is not really surprising.

        Is there a way to set Power Query to ignore blank rows when it imports the data?

        I just pre-ordered M is for (Data) Monkey. Can’t wait until it comes out! I have really enjoyed other Power Pivot Pro books!

  15. Hi Miguel,
    Great post, it’s really astonishing. I was attempting to perform a very similar exercise combining multiple csv files with broadly the same fields (Payroll files, so some people had one kind of overtime, others had shift allowances and so on). The other annoyance was that there were an awful lot of them, some in excel, some in csv and it was critical that the filename be recovered in some way and held in a field. I made attempts in excel, power query and finally approached it with access, in rather a clumsy fashion but got there in the end. This ticks all my boxes, so well done.
    Rory

  16. Thanks Miguel! I’m bowled over by Power query and this post, being a complete newbie. I wanted to know how you would treat a similar case but with far more complex tables where there are no identifiable headers. Thanks again for a great start.

    1. Hey Alex,

      Interesting idea! just one quick question, do you find it difficult to combine/merge/append tables with Power Query right now?

      Heres’ a video on how you can append multiple tables from virtually any data source: https://www.youtube.com/watch?v=XPWnPAhQEiQ

      it’s quite easy! and you have a full blown window to see the a preview of the results before applying it which is an fail-safe feature that Power Query has

          1. Yep, but complication (probably not so common among users) import of model in Power BI delete current model and all creations. You cannot just add it up. E.g if you have models (old) for different years, you cannot mix it up. Just hard way to find source data, run power query, upload to power BI, make everything again

        1. I highly encourage you to check the supporting document again. When you import your Excel Data Model you bring everything 🙂 you don’t start from scratch but you start from where you actually left.

          I just adds up to what you already have – the Power BI Desktop is an amazing tool!

  17. Hi Miguel –
    If your folder is on Sharepoint or elsewhere on the web, is there a way to connect to it in this way, or do you have to go file by file?
    juanito

  18. Great and useful post. Can you please let me know where or how I can access the data model and see the relationships on PowerQuery ribbon, thanks

  19. This is excellent. Thanks so much for posting it.

    It’s tremendous that you can nest transformation steps into the formula that generates the custom columns. For instance in your example you promoted the headers then combined the files. I am working with some data where I need to remove some rows, unpivot columns, add a custom date field and then combine the files. Pretty amazing that it seems to be no problem.

    Looking forward to your book. Thanks again.

  20. Great Post… but I have a Question… I want to append multiple files located into a folder but I have to applie some transformations to these files, the transformation is similar to each file. So I want every file gets the same transformation and all appended dynamically, meaning that for every new file in the folder, the model must append automatically the content.

    1. You’ll probably want to create a function that would apply the transformation first and will output a new column with the table ready to be appended. then use that column to append all the tables that you need

  21. Miguel – thanks for a very informative post. I have a question for you. I’m using power query to combine 150 text and csv files with a total size of 300MB. The returned recordset is 3,000,000 rows and is compressed to 20MB in powerpivot. The flat files are on a network drive. The excel power query report is on the same network drive. This query takes about 10 minutes to run. I have 64 bit Excel 2013 with 16GB of RAM. Any ideas how to make this query run faster? Thank you.

    1. hey Phil! sorry for the late reply but I’d try to go right into a Table.Combine or Combine binaries function. You might want to contact the Power Query team to see if there’s a bug.

  22. Miguel,

    Thank you very much for your work on this. It works like a charm except for 1 problem. In a CSV if a column header is “1 up” there is no problem, and on the My List the column is shown as 1 up without quotes. However, if the column header includes a comma like “1, up” the the My List shows the column name as “1,up” with the quotes. The problems then lies in the expanding step, no data is pulled for the column 1, up. I assume this is due to the names not matching since the original version has no quotes and the My List version includes quotes.

    Any thoughts on how I can get around this?

    Thanks

    1. hey Cliff

      I just tried replicating the issue and I did not get the same results as you. This is how my csv file looked like:
      “1,up”,”2,up”,”3,up”
      1,2,3
      1,2,3
      1,2,3

      and all of the headers came without any quotes. Would you mind sending me a csv file with a simplified version of what yours has? [email protected]

  23. Hello Sir,
    Can you please tell how to export that big table from Powerpivot to a big CSV ?

    Thanks in Advance.

    1. You can get the big table from Power Query though. When you load it to Excel, load it as a table. Then you have the big table in excel and you can save it as an xlsx or copy and past to a new file to save it as a CSV while keeping the other file that did the combination.

  24. Hi Miguel,

    I have a problem with my append with some data which I am trying to create a report out of.

    I have worked out how to get an API feed into PQ and then learnt how to append it to an excel file which contains the same data but historical dates.

    The API feed is set to bring in the last seven days of data and then be added to the historical to create a new query (which is the appended one).

    However when I update the file, I am starting to loose data in the middle, as the API is the last seven days (as it can only bring in 25,000 rows) it means I loose data that is before those seven days but after the last date in the historical data file. E.g If my API brings in data from the 24th to the 30th, but my historical data only goes up to the 15th, then I have lost the data for the 15th to the 24th.

    Do you know how I can append so that when I run the report the API data saves onto my historical data so I no longer loose dates?

    Ideally I would like to be able to append my API to my historical sheet, and the data from my API then adds to my historical sheet, so when I am a week down the line I don’t loose any data.

    Thanks a lot.

    1. Sadly, Power Query nor Power Pivot or any of the modern tools know how to do a Delta or incremental update. They work by simply always performing the whole operation when clicking on the “refresh” button, so that means that you’ll have to store such data someplace, somehow and then append it to the query that you’re working with.

  25. Does this still work this way for xml in 2016? Having trouble creating the custom column so I can combine the xml binaries. PS if you did a series on handling XML would be awesome always find it hard in power query, but trying again, usually I resort to converting it to csv with python as power query handles that much easier.

    Would so love to be able to do it all in power query though

  26. Ken/Miguel – I just bought your book (M is for (data) Monkey. I could not find any tips on how to bring in multiple access files in a folder via power query. I know how to do that if the files are excel files. However, I don’t know how it work if the files are in access. Could you please provide some tips? Thank you

    1. Hey Ting,

      I’ve tried numerous times to post a response here, but for some reason is not going through. Please check Chapter 21 on creating custom functions, or even better, try watching this video htt ps://youtu . be / RAmSdxt863s

  27. Are there any known limitations with using PowerQuery to combine old macro-containing XLS files (i.e. NOT .xlsx nor .xlsm files). I have been fighting with PowerQuery for sometime trying combine binaries on a 9600+ old XLS files. I keep getting this error: Details: “Query ‘Sample Binary Parameter4 (2)’ (step ‘Source’) references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.”

    I found Ken Puls’s solution for that error message, https://www.excelguru.ca/blog/2015/03/11/power-query-errors-please-rebuild-this-data-combination/
    but following those steps has not solved the problem either.

    The only thing that seems to work is to open the old .xls file and re-save it as a .xlsm file. Which isn’t something I’m looking forward to doing on almost 10,000 files. I’ve tried at least one “xls to xlsx” converter macro and unfortunately, likewise, that did not work.

    Thank you!

    1. I tried a different XLS converter macro and that one DID work, now those files are PowerQuery-able. So my particular problem is currently solved. However still wondering why I was having trouble with those old XLS files and the Combine Queries function… anyone know if that is a limitation of PowerQuery?

Leave a Comment or Question