skip to Main Content

The scenario – source data sheets that are structured differently!

Does Power Query sometimes seem too rigid? It’s great for defining a sequence of transformations to clean your data but requires the incoming data to be in a consistent format. Today, I want to talk about a technique for dynamically handling sheets with different structures.

You can download a zip file with the updated sample .pbix and the .xlsx workbook that feeds into it to follow along with here.

Be sure to update the query (the Source step path) to point to your own local machine.

Let’s look at my sample data. Suppose my sales manager sends me a monthly sales report structured similarly to that below:

Sample Data

This table looks pretty clean, right? Using Power Query, you can load this sheet, remove the first two rows and columns, unpivot the data, and you have the numbers you’re looking for in tabular format.

Patting yourself on the back, you file the report away until February rolls around. The Sales Manager sends you the new numbers, and you hit refresh, and… an error! Peeking at the data, you see that he changed the layout of the file and also added new salespeople to the matrix!

No leading columns!

Extra Rows!

As enjoyable as it would be just to tell the manager to stick to a consistent format, sometimes this isn’t realistic or feasible. So, let’s look at how we can use Power Query to handle this chaos:

  1.  Identify the header name for the column that demarcates the beginning of the matrix.
  2.  Automatically detect rows up to that header row.
  3.  Automatically detect columns up to that header column.
  4.  Remove the extra rows and columns, leaving you with a squeaky-clean dataset.
  5.  Unpivot the matrix for each sheet and append into a single dataset.


Looking back at our sample data, you can see that there is a common cell in all our sheets that lies at the upper left of the data we want to extract. In this case, the target cell is the “Region” header. We want to remove all rows ABOVE that cell and all columns LEFT of that cell without hard-coding the location of that cell.


We need some way to identify where the header row starts so that we can remove rows up to that point. This functionality is something that I would have thought is built in by default, but surprisingly is not! Luckily our friend Ken Puls over at came up with [a solution] for this which I’ve adapted slightly for our purposes.

Load your first worksheet into Power Query, add an Index column, and filter the table to the target value from step 1:

  1. Add Column > Index column > From 0.
  2. Add a custom step (click the fx button in the formula bar).
  3. Replace the formula with some custom M code: Table.FindText(#”Starting Table w/Index”, “Region”). Name the step “Filtered to Header Row.”

Load First Worksheet

Note that you’ll want to replace the yellow-highlighted text with the Target Header name from step 1. Table.FindText() scans the entire table for the target text and returns any row with that value. So be careful that your dataset doesn’t have that exact target word in other places!

Now we have our header row isolated along with the Index value for that row. Rename the step to “Filtered Header Row” as we’ll come back to this shortly.

Filtered Header Row


Let’s move on to the more difficult part: dynamically removing leading columns. We have a bunch of columns, and we want to eliminate the first X, up to our target column. We’ll leverage the technique above and add some P3 secret sauce.

First, transpose the #”Filtered Header Row” step and add an index. That will make a single row table into a single column table that we use to identify the columns to remove.

  1. Transform > Transpose.
  2. Add an index column: Add Column > Index column > From 1.
  3. To handle blank columns in the header row (always a possibility in dirty data), add a custom column that checks if the column list has any nulls: Add Column > Custom Column > if [Column1] = null then “Column” & Number.ToText([Index]) else [Column1].

Our goal is to delete all columns left of the “Region” column (or above “Region” in the transposed table) so let’s find the index of that row:

  1. Right Click Column1 > Text Filters > Equals > “Region”.

Text Filters > Equals > “Region”

We’re building upon Ken’s technique of finding the Index that corresponds to a target cell but this time with a transposed table. Since we’ll reference this number a couple of times, let’s Drill Down to just the Index number so that we have an easily referenceable step:

  1. Right-Click on the Index value > Drill Down.
  2. Rename that step to “TargetColumnIndex”.

Now, jump around a bit and reference the original column list and filter it down to include ONLY the rows that have an index number less than the target column.

  1. Click the fx button to insert a New Step.
  2. Revise the M code to point to the full column list: = Table.SelectRows(#”Added ColumnName”, each [Index] < #”TargetColumnIndex”.

Let’s break down what we’re doing here: the outer Table.SelectRows filters the inner table #” Added ColumnName” down to all rows that have an Index less than the “TargetColumnIndex” value we isolated a couple of steps ago.

Filter Rows

Finally, remove the helper columns keeping only “ColumnName,” and you have a nice list of columns to exclude!


We now have all the pieces we need to eliminate our junk rows and columns! Let’s jump back to our original query and clean it up.

Create a new step and change its code to reference the Starting Table:

  1. Click fx > rename step to “Back to Starting Table” > change code to = #”Starting Table”.
  2. Home > “Remove Top Rows.” Enter any value for the parameter.
  3. Edit the M code directly and change the 2nd parameter of Table.Skip(#”Back to Starting Table,” #”Filtered to Header Row”[Index]{0}), in this case, we want to reference the step where we isolated the header row number from earlier.
  4. Home > Use First Rows as Headers.

Create new step

Boom! We’ve dynamically eliminated the top rows!

Now for the final act, we’ll tweak Table.RemoveColumns (when you do “Remove Columns” Power Query uses this function) to use a dynamic parameter! Remember that list of columns we generated earlier, the list we want to extract? That’s what we’ll input into Table.RemoveColumns.

First, select any one of the junk columns and right-click > “Remove Columns.” Take a look at the formula that Power Query generated.

  1.  Table.RemoveColumns(#”Promoted Headers”,{“Column1”}).

We know that Table.RemoveColumns requires a list for its second argument, so we can reference the list of columns to remove from an earlier step:

  1. Table.ToList(#”Columns to Remove”).

Now we’re left with a clean matrix which we can easily unpivot without any problems.

    1. Right-click the Region column > Unpivot Other Columns.
    2. Rename columns something useful.


STEP 5: Convert magic query to function

  1. The final step is to convert your magic query into a function so that you can apply this dynamic transformation to each file/worksheet that needs to be unpivoted.

To see how the final combined query looks with the function, please download this file here.


Using the technique of identifying a target row and column, you can create extremely powerful and dynamic queries that can handle input files that aren’t always perfect, because let’s face it, you can’t always rely on training or data validation to prevent end users from modifying your perfect templates. Our job as data modelers is to make the end user experience as friendly as possible by foreseeing and handling exceptions.

If you have any ideas on how you might use this in your reports, please feel free to share in the comments section!

EDIT: Reader Mike Ward posted a very insightful comment on a more elegant way to remove leading rows and columns. His method not only eliminates a few steps in the Query but also allows for the same process to be applied to removing leadings rows as to leading columns. Furthermore, its use of “List.Contains” enforces an exact match on the Target Row/Column identifier. I recommend interested readers review his comment from August 3rd, 2018 and then review the query titled “Jan_MikeWard” in the updated PBIX file (download here).
Thank you for sharing, Mike!

Come for the Techniques, Stay for the Business Value!

We get it:  you probably arrived here via Google, and now that you’ve got what you needed, you’re leaving. And we’re TOTALLY cool with that – we love what we do more than enough to keep writing free content.  And besides, what’s good for the community (and the soul) is good for the brand.

But before you leave, we DO want you to know that instead of struggling your way through a project on your own, you can hire us to accelerate you into the future. Like, 88 mph in a flux-capacitor-equipped DeLorean – THAT kind of acceleration. C’mon McFly. Where you’re going, you won’t need roads.

Narayana Windenberger

Narayana (pronounced “Na-ryan-a”) has a broad analytical background with 8+ years of experience in analyst roles requiring both business acumen and IT know-how. After experiencing first-hand how communication costs can cripple turnaround time in projects, he forayed into self-service BI by learning SQL and deploying SharePoint-based Excel dashboards. Skipping the traditional IT middle man, he became the reporting expert as Business Intelligence manager at FoodChain ID, a leader in food certification. More recently he has dived into the world of DAX, PowerPivot and Power BI and enjoys the “problem-solving thrill” that comes from implementing fully automated dashboards.

Narayana has a high standard of excellence, brings a positive attitude to the task at hand, and is deeply curious to understand how things work. In addition to holding an MBA, he is a Microsoft Certified Solutions Associate in BI Reporting and is a Microsoft Certified Excel Expert.

This Post Has 35 Comments
  1. Wow, this is an excellent post.

    Your explanation of using ‘M’ to solve a common issue (changing file structure) is practical and powerful.

    Thank you for sharing.

  2. Hi Narayana, this looks really useful. Any chance you can post the Unpivot Reports.xlsx file that is in the source? We can then follow with the pbix you posted. Thanks.

  3. Narayana:

    Great post. Going to be very useful.

    Really appreciated both files so we could “play at home”.

    Couple of suggestions:
    1. Query is labeled 2015 vs. xlsx data which reflect 2017. May want to rename to Jan 2017
    2. Source in query points to file location on your PC. Not a big deal to change the pointer to our respective download location but
    you might want to mention the need to modify the source location in the query.
    3. Part 2?
    a. update query to capture the Month & Year (cell A1) from the worksheet
    b. create function to extract data from other sheets in the workbook and append; excludes sheets that do not include data (ie
    notes/instruction tabs

    Thanks for posting

    1. Hi Ham,

      Thanks for your thoughtful suggestions! I have updated the blog post and files as you recommended, and added a completed PBIX file in Step 5 that includes a function that combines all sheets into a single query.


  4. it won’t work for me. when i’m trying tu put a formula, i got an error: Expression.Error: ‘The name ‘Starting Table w/Index’ wasn’t recognized. Make sure it’s spelled correctly.’. i tried to copy your formula also, but no use.

    1. You may need to either rename the step in which you added the index to “Starting Table w/Index” or change the “Starting Table w/Index” reference to whatever your step is after you add the index, likely #”Added Index”

  5. This is a post that is sorely needed but unfortunately there seems to be some errors that is preventing dummies like me from getting the result. Can you update the post to be precise? If you don’t have time, no worries

  6. Narayana; very brainy!! Excellent;

    as suggested by Ham Barnes; could you pleas add Query how to capture the Month & Year (cell A1) from the worksheet

    Cheer!! Soni

      1. Present Condition of the Query is as below:

        Date , Column2, Column 3, Column 4

        28/3/18, Product No, Product, Catalog
        28/3/18, 10AXX10, Chairs, Soni1
        27/3/18, 10AXX10, Tables , Riser1


        Date , Product No, Product, Catalog
        28/3/18, 10AXX10, Chairs, Soni1
        27/3/18, 10AXX10, Tables , Riser1

        Kindly advise

          1. Hi Narayana ; Promote headers: When promoting headers I want to retain the name the first column as “Date”

  7. I have another question; during the Promote Header; I don’t want to rename the existing one or more Column Name? Please advise

    1. Thanks Narayana; your reply on Promating header with date is very useful. I have any case as below; Please advise.

      Present Condition of the Query is as below:

      Reference, Date , Column2, Column 3, Column 4

      R001, 28/3/18, Product No, Product, Catalog
      R001, 28/3/18, 10AXX10, Chairs, Soni1
      R001, 27/3/18, 10AXX10, Tables , Riser1


      Reference, Date , Product No, Product, Catalog
      R001, 28/3/18, 10AXX10, Chairs, Soni1
      R001, 27/3/18, 10AXX10, Tables , Riser1

      I am able achieve for Date based on your advise using PromoteAllScalars=false; How to do the same for “Reference” Column Header.


    2. That’s a tough one and I don’t immediately see an obvious solution since the Reference column stores text values. I think this would require thorough analysis which is beyond the scope of this comment section.

  8. How did you learn M? I am confused half the time when looking at it (and certainly cannot just figure it out on my own).

    1. I’d say I learned mostly by doing — using the out of the box transformations in PowerQuery (i.e. the buttons in the ribbon) and then seeing the code that is generated. In addition, I have experience with some other programming languages like Javascript, PHP, VBA, and SQL, which helps me understand the syntax and structure of the M formulas.

      If you’re starting from scratch you might consider reading “M is for (Data) Monkey: A Guide to the M Language in Excel Power Query.”

      Good luck!

  9. Hi Narayana,

    Thanks for this post which was very thought provoking as I’m trying to make my PowerQueries more robust to changes in the source files generally!

    I followed it through but didn’t find it intuitive so I challenged myself to streamline the process and apply the same steps to both rows and columns.

    This is what I came up with and I’d be interested in your thoughts in case there is any downside to doing it this way….

    Record.FieldValues(_) returns a list of the values in each field on the current row
    List.Contains(list,”Region”) will then return true or false to indicate which rows

    So I used 2 steps to do the bulk of the work :

    CheckWanted = Table.AddColumn(#”Previous Step Name”, “Wanted”, each List.Contains(record.FieldValues(_),”Region”)

    SkipNotWanted = Table.Skip(CheckWanted, each [Wanted] true)

    Then transpose the table and repeat the same process to deal with the columns (which are now rows)

    Then transpose back again.

    It seems to work in less steps and I prefer it because the process for handling extra columns is identical to the process for handling extra rows.

    Also the List.Contains is looking for an EXACT match on “Region” so it avoids the problems if another record contains the text Region1 for instance which would be an issue for Table.FindText

    Are there any downsides to doing it this way because so far it seems to be working just fine.


    1. I think posting has stripped out some characters it thought were HTML !

      The SkipNotWanted checks each [Wanted] IS NOT EQUAL TO true

      but its done with “less than” and “greater than” which have been stripped out of the posted comment

    2. Hi Mike, that’s a very elegant solution! I updated the sample PBIX file with your approach and added a note in the article to that effect so that others can benefit from your idea. Great work!

  10. Hi Narayana –

    Awesome post, and thanks for editing to include Mike Ward’s solution!

    I needed to modify it a little because my table has 16k+ rows. Power Query only supports 16,384 columns, so the transpose step will fail. Here’s the logic I followed instead:

    1) Remove leading rows just as Mike did.
    2) Keep just the header rows with Table.FirstN(table, 1)
    3) Transpose THIS table, since a 1 row table will now just be 1 column
    4) Remove the leading “rows” again, just as Mike did
    5) Count how many “rows” were removed…do this with N = Table.RowCount(before removal) – Table.RowCount(after removal)
    6) Remove the first N columns from the resulting table in Step 2…a la Datachant’s method mentioned here: (remove the first N elements of the list of column names)
    7) Promote headers

    Here’s the M code. Note that I stored the value to search for as a step so that it’s “change once, apply many”:

    Source =
    ,Starting_Value = “Label” // change as needed

    // these first 3 steps are the same as Mike Ward’s solution
    ,Add_HeaderCheck_ROW = Table.AddColumn(Source, “Wanted”, each
    List.Contains(Record.FieldValues(_), Starting_Value)),
    Skip_RowsAboveHeader_ROW = Table.Skip(Add_HeaderCheck_ROW, each
    [Wanted] true)
    ,Remove_HeaderCheck_ROW = Table.RemoveColumns(Skip_RowsAboveHeader_ROW

    // keep only the first row to avoid transpose errors for large tables
    ,FirstRow_forColumns = Table.FirstN(Remove_HeaderCheck_ROW, 1)
    ,Transpose = Table.Transpose(FirstRow_forColumns)

    // these 2 steps are the same as Mike Ward’s solution
    ,Add_HeaderCheck_COL = Table.AddColumn(Transpose, “Wanted”, each
    List.Contains(Record.FieldValues(_), Starting_Value))
    ,Skip_RowsAboveHeader_COL = Table.Skip(Add_HeaderCheck_COL, each
    [Wanted] true)

    // count how many rows the previous step removed…this is how many columns to remove from the un-transposed table
    ,Rows_Removed = Table.RowCount(Add_HeaderCheck_COL) – Table.RowCount(Skip_RowsAboveHeader_COL)

    // uses Datachant’s method of removing the first N columns from a table…without referencing column names
    ,Remove_NColumns = Table.RemoveColumns(Remove_HeaderCheck_ROW
    ,List.FirstN( Table.ColumnNames(Remove_HeaderCheck_ROW), Rows_Removed) )
    ,Headers = Table.PromoteHeaders(Remove_NColumns, [PromoteAllScalars=true])

  11. Glad you found my solution helpful 🙂

    Thanks for sharing your additional layer of complexity which is bound to help someone else.

    It’s good to share 🙂

  12. Hello, I am new to Power query & trying to apply your code. How ever I am getting en error as below

    Expression.Error: The name ‘Added ColumnName’ wasn’t recognized. Make sure it’s spelled correctly.

    The column name is correct in previous step, I am not getting why this throws this error


  13. Can you please let me know which sample file you’re using as well as the query and applied step returning the error?

  14. Very interesting solution that I think gets me close to what I am looking for. I have a more complicated file because the source system creates multiple the data in a non-repeating fashion.


    Season, KCA – November 2019
    Class, KCA Select Friday 4:30pm
    ,Customer, Age, Email, Tel
    ,Jon Doe,12,[email protected],xxx-xxx-xxxx
    ,Jane Smith, 13,[email protected],xxx-xxx-xxxx
    Class, KCA Select Friday 5:45pm
    ,Customer, Age, Email, Tel
    ,Jon Doe,12,[email protected],xxx-xxx-xxxx
    ,Jane Smith, 13,[email protected],xxx-xxx-xxxx

    What I would like to get as my end result is
    Class Month, Class Year, Class Day, Class Time, Customer, Age, Email, Tel
    November, 2019,Friday, 4:30,Jon Doe,12,[email protected],xxx-xxx-xxxx
    November, 2019,Friday, 4:30,Jane Smith, 13,[email protected],xxx-xxx-xxxx
    November, 2019,Friday, 5:45,Jon Doe,12,[email protected],xxx-xxx-xxxx
    November, 2019,Friday, 5:45,Jane Smith, 13,[email protected],xxx-xxx-xxxx

    Using your technique I know I can get the Class Month and Class Year. The problem I see is creating the Class Day and Time given they change as the file is processed. The long term challenge is we would want this to accept new files each month so we could start doing trend analysis.

    Any advice would be appreciated.


  15. Hi Steven, that is a more complex use case, but I took a few minutes to put this together. You can copy and paste this M code into a new query and you should be able to follow along with the steps I used. I hope that helps and if you need further assistance you can always reach out to us at [email protected]. We do this work all day long and can quickly help your company.

    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText(“i45WCk5NLM7PU9JR8nZ2VNBV8MsvS81NSi1SMDIwtASKglCsTrSSc05icTFUVXBqTmpyiYJbUWZKYqWCiZWxQUEustLS4pL83NQiIN8xPRVIuuYmZuYA6ZDUHLACr/w8BZd8kIyhEZCoqKxySExK1kvOB5liamqqC8KGRsYmENWJeakKwbmZJRkgDcZAAqjYAagJTQNQvSkBp5pamZgOKqeiBb5LavJo4NPHqbEA”, BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t, Column3 = _t, Column4 = _t]),
    #”Changed Type” = Table.TransformColumnTypes(Source,{{“Column1”, type text}, {“Column2”, type text}, {“Column3”, type text}, {“Column4″, type text}}),
    #”Added Custom” = Table.AddColumn(#”Changed Type”, “Season”, each if [Column1] = “Season” then [Column2] else null),
    #”Filled Down” = Table.FillDown(#”Added Custom”,{“Season”}),
    #”Added Custom1″ = Table.AddColumn(#”Filled Down”, “Class”, each if [Column1] = “Class” then [Column2] else null),
    #”Filled Down1″ = Table.FillDown(#”Added Custom1″,{“Class”}),
    #”Filtered Rows” = Table.SelectRows(#”Filled Down1″, each ([Column1] = “Jane Smith” or [Column1] = “Jon Doe”)),
    #”Renamed Columns” = Table.RenameColumns(#”Filtered Rows”,{{“Column1”, “Customer”}, {“Column2”, “Age”}, {“Column3”, “Email”}, {“Column4”, “Tel”}}),
    #”Extracted Text After Delimiter” = Table.TransformColumns(#”Renamed Columns”, {{“Season”, each Text.AfterDelimiter(_, “- “), type text}}),
    #”Split Column by Delimiter” = Table.SplitColumn(#”Extracted Text After Delimiter”, “Season”, Splitter.SplitTextByDelimiter(” “, QuoteStyle.Csv), {“Season.1”, “Season.2″}),
    #”Changed Type1″ = Table.TransformColumnTypes(#”Split Column by Delimiter”,{{“Season.1”, type text}, {“Season.2″, Int64.Type}}),
    #”Split Column by Delimiter1″ = Table.SplitColumn(#”Changed Type1”, “Class”, Splitter.SplitTextByEachDelimiter({” “}, QuoteStyle.Csv, true), {“Class.1”, “Class.2″}),
    #”Changed Type2″ = Table.TransformColumnTypes(#”Split Column by Delimiter1”,{{“Class.1”, type text}, {“Class.2″, type time}}),
    #”Split Column by Delimiter2″ = Table.SplitColumn(#”Changed Type2”, “Class.1″, Splitter.SplitTextByEachDelimiter({” “}, QuoteStyle.Csv, true), {“Class.1.1”, “Class.1.2″}),
    #”Changed Type3″ = Table.TransformColumnTypes(#”Split Column by Delimiter2”,{{“Class.1.1”, type text}, {“Class.1.2″, type text}}),
    #”Renamed Columns1″ = Table.RenameColumns(#”Changed Type3”,{{“Class.1.2”, “Class Day”}, {“Class.2”, “Class Time”}, {“Season.1”, “Month”}, {“Season.2”, “Year”}}),
    #”Removed Columns” = Table.RemoveColumns(#”Renamed Columns1″,{“Class.1.1″})
    #”Removed Columns”

  16. It was all going so well! Looking to apply these steps to a sheet of my own and I have gotten down to
    1. Right-Click on the Index value > Drill Down.
    2. Rename that step to “TargetColumnIndex”.

    However, when entering the code “= Table.SelectRows(#”Added ColumnName”, each [Index] < #"TargetColumnIndex")" to follow on, it is coming up with the below error:

    Expression.Error: We cannot apply operator < to types List and Number.

    Any advice on what went wrong? (Apologies, I realise I have posted moments ago but had put error in -that was from me trying to fix it! This was my original error)

Leave a Reply

Your email address will not be published. Required fields are marked *