by Matt Allington

The more I use Power Query, the more I want to use Power Query – and of course the more I learn.  I have seen quite a few references on the Web to “consolidating multiple Excel Workbooks into a single table”, but I came across a problem working with a client recently where I had to undo someone else’s ‘helpful’ partitioning of data into separate worksheets in the one workbook.  You may have come across the problem yourself.  Someone has been collecting data over time, and they start a new sheet every month.

image

Well intentioned but actually quite unhelpful.

The procedure I use here to bring this all back together in a single table is the same basic concept as consolidating multiple workbooks, however I have learnt a thing or two since the first time I copied someone else’s pattern on how to do this.  Specifically I have learnt the correct and easiest way to create and work with Power Query functions.

The basic process/pattern is:

  1. Create a query to import a single sheet of your data.  Do all the transformations you need here.
  2. Convert the basic query from step 1 above into a function (the easy way) while keeping some backwards capability.
  3. Create a second query that uses the function from step 2 and consolidates all the objects (Sheets in this example), but the same principles apply for any other object.

There are other approaches you can use to do this (as is often the case with Excel) but this procedure demonstrates how easy it is to write Functions if you understand the basic concepts.

First Create a new blank workbook edit a single sheet

Go to Power Query Menu\From File\From Excel.  You will see in the navigator the list of sheets.  Just select any one of these sheets in the list.   For this process to work, the column structure of the sheets needs to be the same.

image

I selected Jan above and then edited the query.  In this sample data, there are no transformations required.  If you need to reshape your data, then now is the time to do it, before you move on.

After you have completed your transformations, you will want to give the query a new name.  The generally accepted approach I have seen is to give the query a prefix like “fn” – I normally call mine fnCombine or something like that.

image

The next step is to change the Power Query Language so it turns this specific query into a function that is generic and can be reused.  There are easy ways and hard ways to do this – I now know the easy way – read on. 

Convert to Function

Switch to the Advanced Editor (View\Advanced Editor) and you will see the Power Query Language code generated by the UI.

image

Now I have seen more complex versions of the next step, but this is the easy way.  I learnt this when I attended Miguel Llopis’ Power Query session at PASS BA Conference this year.  The thing is you don’t need to write additional ‘let’ and ‘in’ statements as I have seen others do (and as I have done in the past).  All you need to do is to add a line of code that accepts the parameters.

(Parameter1,Parameter2,ParameterN) =>

That’s it.  It is really simple.  So you can see the extra line of code added as line 1 below.  Don’t worry about the other lines of  code from line 5 on.  These were all generated by the UI and are not important for the purpose of this post – yours will be different anyway.  The only lines that matter are line 1 and line 4.  For this demo, just ignore all the others.

(mySheet)=>
let
  Source = Excel.Workbook(File.Contents("C:\Users\matt\Documents\Blogs\Consolidate Worksheets.xlsx"), null, true),
  Jan_Sheet = Source{[Item="Jan",Kind="Sheet"]}[Data],
  #"Promoted Headers" = Table.PromoteHeaders(Jan_Sheet),
  #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"ProductKey", Int64.Type}, {"OrderDate", Int64.Type}, 
  {"CustomerKey", Int64.Type}, {"SalesTerritoryKey", Int64.Type}, {"SalesOrderNumber", type text}, 
  {"SalesOrderLineNumber", Int64.Type}, {"OrderQuantity", Int64.Type}, {"UnitPrice", type number}, 
  {"ExtendedAmount", type number}, {"UnitPriceDiscountPct", Int64.Type}, {"DiscountAmount", Int64.Type}, 
  {"ProductStandardCost", type number}, {"TotalProductCost", type number}, {"SalesAmount", type number}, 
  {"TaxAmt", type number}, {"Freight", type number}, {"RegionMonthID", type text}})
in
  #"Changed Type"

If I just save it now – no other changes – it is now a function.

image

But I’m not finished.  The above function doesn’t do anything, because the parameter that is passed to the function in the first line of code is not used in the Power Query Language anywhere yet.  So I then changed line 4 like this.

Jan_Sheet = Source{[Item=mySheet,Kind=”Sheet”]}[Data],

All I did was replace the hard coded sheet name “Jan” with the Parameter passed to the function in the first line.  So as you can imagine, it is now possible to reuse this function over and over again.  Instead of it being a hard coded single purpose function that operates on the sheet “Jan”, I can now pass any number of different sheet names to the function, and the function will perform the exact same transformation over the specified sheet(s).

Now here is a second tip.  Rather than just changing the required line(s) of code, I recommend you duplicate the line(s), comment out the original line(s) and then change the duplicate copy.  I started to do this because sometimes I needed to come back and make changes to the query after I had converted it to a function.  It is much easier to make changes using the UI, but that is only possible if you convert it back from a function to a normal query.  If you are not an expert in the Power Query Language, it can be very difficult to work out how to “undo” the function conversion.  So if you keep the original line of code, it is much easier to go back to where you started.

After I duplicated the original line of code, what I actually end up with is this (note the double slash to comment out the original line of code – line 4.  And notice the duplicate line of code (line 5) with the modification that accepts the parameter from the function.  As a side bonus of doing it this way, it is much easier to see what changes you have made to the query.

(mySheet)=>
let
    Source = Excel.Workbook(File.Contents("C:\Users\matt\Documents\Blogs\Consolidate Worksheets.xlsx"), null, true),
    //Jan_Sheet = Source{[Item="Jan",Kind="Sheet"]}[Data],
    Jan_Sheet = Source{[Item=mySheet,Kind="Sheet"]}[Data], 
    #"Promoted Headers" = Table.PromoteHeaders(Jan_Sheet),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"ProductKey", Int64.Type}, {"OrderDate", Int64.Type}, 
    {"CustomerKey", Int64.Type}, {"SalesTerritoryKey", Int64.Type}, {"SalesOrderNumber", type text}, 
    {"SalesOrderLineNumber", Int64.Type}, {"OrderQuantity", Int64.Type}, {"UnitPrice", type number}, 
    {"ExtendedAmount", type number}, {"UnitPriceDiscountPct", Int64.Type}, {"DiscountAmount", Int64.Type}, 
    {"ProductStandardCost", type number}, {"TotalProductCost", type number}, {"SalesAmount", type number}, 
    {"TaxAmt", type number}, {"Freight", type number}, {"RegionMonthID", type text}})
in
    #"Changed Type"

If at any time I need to edit the query, I can simply do the following.

  1. Put a double slash in front of line 1        //(mySheet)=>
  2. Remove the double slash from line 4
  3. Put a double slash in front of line 5

And bingo, I am back in business and I can use the UI again.

Save the function again and close.

Create a second query.

Next I created a second query that points to the same file, but this time instead of selecting one of the worksheets I selected the workbook itself.

image

When you edit this folder (which is actually the workbook) you will see a list of all of the sheets in a Power Query window.

image

You may have to do some filtering here before proceeding.  In this demo, I filtered out Sheet1 and Sheet2 using the filter icons.  I used the option to filter out rows that begin with the word “Sheet” to make it “future proof”.

So now I have a nice clean list of sheets that are in the workbook, and the name of each sheet is in a single column.  This is the secret sauce.  Because the sheet names are in a column, they can be “passed” to the function – and then the function will process the sheets one at a time.

image


Interested in Learning How to Do this Kind of Thing?

power query

Our New Online Power Query Course just launched this year. You’ll learn Power Query from the best in the business, two Excel MVPs Ken Puls and Miguel Escobar. We’ve included 7 Hours of Self-Paced Video over 31 Courses, with lots of example files, and a Digital Badge powered by Acclaim.

CLICK HERE to claim your spot.


Add a custom column to call the function

So then I added a custom column – I just typed a hand written formula in the box that calls the function.

Remember I renamed the query to be “fnCombine” – so that is now the name of the function I have to call.  So all I had to do is just write the code using the UI to help me.

=fnCombine([Item])

Power Query is case sensitive, so you have to type it exactly as the name of your function – FNCombine will not work!  Note how I have inserted the [Item] inside the function. [Item] is the column from the earlier step that contains all the sheet names.  All of the column names that exist in the query are conveniently located and available inside the “add column” editing window.

image

After inserting the custom column, you will see each row of the new column is a full table in its own right.  Each table is the full data from each sheet with the transformations applied.

image

Expand the Column

Now expand the column using the icon shown above.

Select the expand option and deselect “use original column name as prefix” down the bottom, then click OK.

image

Finally remove the first 5 columns that are not needed and you are done.  Close and load as required.

Here is the sample data I used to import.  Feel free to use it to practice.  Sample Workbook

Matt Allington is a professional Self Service BI consultant and Author specialising in Power Pivot and increasingly Power Query based in Sydney Australia.

  Subscribe to PowerPivotPro!
X

Subscribe

Matt Allington

Matt Allington is a Microsoft MVP specalising in Power Pivot, Power BI and Power Query Consulting and Training based in Sydney Australia. Visit Matt's blog here. 

This Post Has 26 Comments

  1. Hi Matt 🙂
    Can I ask, why not use the “Data” column?
    In this case, it is much simpler than function building and additionally much more efficient because we do not have to download a data from workbook second (third, fourth… twelves) time. We have all the data, that we need, all the time after our first step (= Excel.Workbook(File.Contents…..)
    So the second step (or third – after filtering the sheets) could be
    = Table.AddColumn(FilteredRows, “Custom”, each Table.PromoteHeaders([Data]))
    or .. if we are a little more advanced
    = Table.TransformColumns(FilteredRows, {“Data”, each Table.PromoteHeaders(_)})

    Of course we can do this in your way also 🙂

    One important note….. we can see (in PQ) the changes in month sheets only after saving the workbook.

    Regards

    1. Thanks for the comment Bill. As you will see in my note to Ken, the reason I did it this way is more about where I have come from rather than the specific capabilities of PQ. We are all learning and it is great to have the dialogue.

      I am not clear on how you handled the repeating headers in your code. I can see (now) that you can write this in the Power Query Formula language, but can the UI produce this?

      Of course you can filter out the headers from subsequent Sheets using normal filtering techniques.

  2. Excellent example. Thank you for being so detailed and for including the workbook.

    You and the other Power Query bloggers are the sole reason I am able to use PQ effectively. It’s very powerful but it’s best features are “hidden”. You are bringing them to the light of day!

  3. Matt, next time, start by going directly to your second query. Filter your sheets as normal then, instead of using the function to create a new column of tables, why not just expand the existing column of tables that’s there? 🙂

    1. Thanks Ken. Well in my case, the reason I did it this way is 2 fold. Firstly I came at it from the other direction – first I had the function skill and then had the problem. But secondly I am not clear how to handle the repeating headers on each sheet with this approach. Your comment seems related to Bill’s comment above. I would love to hear how to do that. Can it be done in the UI?

      I guess PQ is a lot like Excel – there are lots of different ways to do things, and the more we have these conversations, the more we all learn from each other – I love that.

      1. Despite what Ken says, I encourage you to try and do it your own way. Your own style 🙂 – we can always learn from each’s point of view and how we can all discover amazing things that this tool has to offer.

        Having said that, Power Query is by far the best Data tool that exists in Excel and it focuses on fixing the situation that all Excel users can relate to: Garbage in | Garbage Out….not anymore with Power Query!!

        Great post, Matt!

  4. Matt,

    Another great post that appears on the powerpivotpro website exactly when I can put it to work right away!

    Currently I am on a Power Query journey and have defined the following levels of achievement as targets:

    Level I: Fluent in using the Power Query UI and can follow M code when I see it (I have reached this).

    Level II: Fluent in writing M code for a broad range of Power Query functions without the UI (I have made my first faltering footsteps into this level and have set a goal that by the end of August this year (using Chris Webb’s book, “Power Query for Power BI and Excel” as my personal guide) to work through this level on my Power Pivot journey.

    Level III: Beginning mastery of Power Query’s M language (could take 9 months to a year to work through).

    Level IV: Beginning guru. Maybe I write a book about a journey from writing procedural code (like VBA) and/or declarative code (like T-SQL) to writing M code (definitely a different experience). Maybe nobody else reads it, but I would make it a great reference for me.

    If Rob would consider adding a forum for Power Query on the powerpivotpro website, at the end of August I would ask if I could be a moderator there.

    A link to a recent post for using Power Query that I found helpful, especially the insights on changing data types is

    http://powerpivotpro.com/2015/06/counting-overlapping-twitter-or-facebook-instagram-etc-followers/#more-13213

    1. Hi Thomas, thanks for the comments and encouragement. And good luck with your personal journey of skill development. I also have aspirations however I doubt I will get passed Level II in your description. In my case it is the 80:20 rule – I get 80% of the benefit from 20% of the learning. I don’t think I could personally get value from investing in levels II and IV, but of course that is me. One challenge I see with going to levels III/IV is that of ‘having enough work to sustain the practice and hence develop the skill’. I find it a bit like learning to speak another language – if you don’t have practice, no amount of physical work you have to put in will make it happen. Having said that, after 30 years I know a lot more about Excel than I thought I ever would.

      You are welcome to join http://powerpivotforum.com.au which has a dedicated Power Query forum, and I would LOVE to have help answering questions. It needs both more questions and more answers to make this fledgling community reach critical mass.

  5. You had me at:

    > If at any time I need to edit the query, I can simply do the following.
    >
    > Put a double slash in front of line 1 //(mySheet)=>
    > Remove the double slash from line 4
    > Put a double slash in front of line 5

    (-:

    Normally I like to write my functions into my queries so it’s all in one place (unless I expect to call the function somewhere else); however this makes tweaking that function _so_ much easier I’m definitely going to start adopting it. I’d tried to similar things but they didn’t work great and weren’t nearly this elegant.

    Another thing I’ve found after doing a bunch of these was to ignore my initial inclination to stick as many of the cleaning transformations as possible into the function. While this makes sense conceptually, placing transformations after the function and merge steps makes troubleshooting way easier. Though you may have more experience here and know better on that front too 🙂

    Great stuff, thanks for posting!

  6. I’ve been working on similar things in PQ this week. The addition to your pattern is how to get the sheet names of the current workbook without hardcoding the filename. Use this code, plus add a Defined Name to Excel as listed in the comment:
    let fImportTab=(tabname)=>
    let
    //Gets file name and path of CurrentWorkbook from Excel Defined Name of “FullName”,
    //defined as =SUBSTITUTE(MID(CELL(“filename”,A1),1,FIND(“]”,CELL(“filename”,A1))-1),”[“,””)
    FullFilePath = Excel.CurrentWorkbook(){[Name=”FullName”]}[Content]{0}[Column1],
    Source = Excel.Workbook(File.Contents(FullFilePath)),
    #”Data” = Source{[Item=tabname,Kind=”Sheet”]}[Data]
    in
    #”Data”
    in
    fImportTab

    1. Mike,

      I believe I remember you from an “Ask Your Questions Here” section at another website. We were trying to sort out some details with DAX Query and there was another guy, Chris Gilbert, who had some interesting insights as well.

      Knowing that you have interesting topics to bring up or to ask about, I wanted to make sure you knew there are also Q & A type forums here at Powerpivotpro for which you can register at:

      http://forum.powerpivotpro.com/forums/

      Avi Singh and Rob launched the forums at the end of March this year (2015) and the Power Pivot forum has just crossed milestones for topics posted, 101 as of now, and posts, 650 as of now. Although the forum is called “Power Pivot”, anything Power BI (including Power Query) is an open topic.

      A special plus to the forums is that you can upload/download workbooks and files (not just images) with your questions and comments.

      Topics and posts are also listed in an orderly, chronological way that makes it easy to follow the flow of individual threads In addition, there is a master index that lists all topics to-date.

  7. Hi Matt, this is an excellent article and helps with a big business need in my company. One question, when I edit the entire Worksheet (to generate the list of sheet names), I am unable to see the Item, Kind, and Hidden columns using my file. (I only see Name and Data.) Do you know what may cause this, as I’m unable to pass the “Item” field into my function. Thanks Matt.

      1. Thanks Matt! The update worked. Another solution I found was to use Jan_Sheet = Source{[Name=mySheet]}[Data] since “Name” was an available column to pass into the function.

  8. Dear Matt
    thank you very much for this post, it think it’s quite smart and saved me a lot of “non added value work”.
    Just a quick remark : if we use an Excel file type of format “xls” (type of Excel 93/2007) when importing it into PQ the “sheet” objects will be recognized as “table”. The function won’t work anymore as described in your post. The solution is quite easy : just save the file with the most recent format xlsx.
    Diego

    1. Hi Diego

      You raise a good point – that everyone should record the steps themselves based on their own circumstances. I have provided an approach above that can be used in many different ways. It sounds like xls files are slightly different, but the general approach for Power Query consolidation will work across many different objects.

  9. Hi Matt
    as I am using heavly your technique I came across a couple of issues. I have a file with several sheets. I creat the function and everything works fine.
    I then apply the fuction and load in several sheets. What happens is that I get an error message saying “DateFormat.Error: Invalid cell value ‘#DIV/0!” or “#REF!”. The error is quite clear : somewhere in the sheets there are cells not properly formatted.

    The big question is where are they? Is there a way to quickly indetify what sheet is having the issue? I tried with the option of “removing errors” from each sheet but it didin’t work. I also tried to remove formulas from the original files by pasting values, but no way.

    Also when I edit the error table automatically generated by PQ, there is no info on how to locate this type of error. The error description is pretty criptic.

    As a result of this, some sheets are skipped…which is of course very annoying.

    Cheers,
    Diego

    1. I have personally experienced exactly the same thing. I don’t have a solution unfortunately. In my case I was able to add an additional sheet inside the source workbooks. What I did was create a blank sheet, in cell a1 of the new sheet I entered “=a1” of the sheet that actually contains the data. Then I wrapped this inside an if statement to trap for the errors, and copied the formula across and down, then hid the sheet. This may not be practical in you case, particularly given there are multiple sheets in your workbook.

      When this happened to me, I also found that the remove error feature didn’t help. I wasn’t able to send my workbook to Microsoft due to the sensitive nature of the data – hence I never followed it up further.

      If you are able to share a sheet with Microsoft, I would suggest you send a frown and ask them to look at it. My experience is they are very responsive.

  10. Hi Matt, this post has been super useful to me, as I have been using it to consolidate sheets in workbooks containing >250 sheets. Combining them manually would surely drive me to drink.

    I do have one question though – is it possible to achieve the same goal without having to specify the file location? I found that if I save the file anywhere else, or if I share it with someone, the code doesn’t work any more because the function looks for the file in its original location, i.e. where it was when the function was written. I’m finding this to be somewhat of a hurdle.

    André

    1. I actually solved my own problem, by replacing the original Source line code simply with =Excel.CurrentWorkbook(). After filtering out all unwanted tables, and, provided (before expanding), the shape of all the tables in the list are exactly the same, Bob’s your uncle.

      I should also say that each sheet in my workbook has a column that uniquely identifies it, so I don’t need the sheet names, nor did I have to write a function.

  11. I can’t make this work, even using your example workbook. Every time I try to load the custom column it generates an “Error” message in each cell of the custom column for each row. Here’s the error message:

    An error occurred in the ‘’ query. Expression.Error: The key didn’t match any rows in the table.
    Details:
    Key=Record
    Table=Table

    Using Excel 2016.

    Any ideas?

Leave a Comment or Question