Power Query (M)agic: Dynamically remove leading rows AND columns

February 20, 2018 at 9:06 am

Wow, this is an excellent post.

Your explanation of using ‘M’ to solve a common issue (changing file structure) is practical and powerful.

Thank you for sharing.

February 20, 2018 at 12:52 pm

Hi Narayana, this looks really useful. Any chance you can post the Unpivot Reports.xlsx file that is in the source? We can then follow with the pbix you posted. Thanks.

February 21, 2018 at 7:47 am

Thanks for the feedback! We’ve updated the blog post to include both source files.

February 20, 2018 at 1:21 pm

I downloaded the .pbix file, but where can I download the Excel sample data?

February 20, 2018 at 8:34 pm

Narayana:

Great post. Going to be very useful.

Really appreciated both files so we could “play at home”.

Couple of suggestions:
1. Query is labeled 2015 vs. xlsx data which reflect 2017. May want to rename to Jan 2017
2. Source in query points to file location on your PC. Not a big deal to change the pointer to our respective download location but
you might want to mention the need to modify the source location in the query.
3. Part 2?
a. update query to capture the Month & Year (cell A1) from the worksheet
b. create function to extract data from other sheets in the workbook and append; excludes sheets that do not include data (ie
notes/instruction tabs

Thanks for posting

February 27, 2018 at 10:50 am

Hi Ham,

Thanks for your thoughtful suggestions! I have updated the blog post and files as you recommended, and added a completed PBIX file in Step 5 that includes a function that combines all sheets into a single query.

Enjoy!

February 21, 2018 at 12:47 pm

Great post!

February 22, 2018 at 2:20 am

it won’t work for me. when i’m trying tu put a formula, i got an error: Expression.Error: ‘The name ‘Starting Table w/Index’ wasn’t recognized. Make sure it’s spelled correctly.’. i tried to copy your formula also, but no use.

February 26, 2018 at 10:24 am

You may need to either rename the step in which you added the index to “Starting Table w/Index” or change the “Starting Table w/Index” reference to whatever your step is after you add the index, likely #”Added Index”

February 27, 2018 at 3:30 am

Thank you. I managed to follow through whole tutorial. Excellent post!

February 26, 2018 at 7:23 pm

This is a post that is sorely needed but unfortunately there seems to be some errors that is preventing dummies like me from getting the result. Can you update the post to be precise? If you don’t have time, no worries

February 27, 2018 at 10:51 am

Hi Tim, thanks for the feedback. Can you specify where you’re having errors and I’ll try to help out!

March 14, 2018 at 11:30 am

Narayana; very brainy!! Excellent;

as suggested by Ham Barnes; could you pleas add Query how to capture the Month & Year (cell A1) from the worksheet

Cheer!! Soni

March 22, 2018 at 11:48 am

To capture the month & year (as long as it’s always in Cell A1 in every input file) you could “Keep Top 1 Row” and “Remove Other Columns” on Column 1 before promoting headers.

Hope that helps!

March 28, 2018 at 12:48 am

Present Condition of the Query is as below:

Date , Column2, Column 3, Column 4

28/3/18, Product No, Product, Catalog
28/3/18, 10AXX10, Chairs, Soni1
27/3/18, 10AXX10, Tables , Riser1

Output

Date , Product No, Product, Catalog
28/3/18, 10AXX10, Chairs, Soni1
27/3/18, 10AXX10, Tables , Riser1

Kindly advise

March 28, 2018 at 12:38 pm

Hi Sanjeev, at what step are you in the process described in the blog post?

March 29, 2018 at 2:55 am

Hi Narayana ; Promote headers: When promoting headers I want to retain the name the first column as “Date”

April 2, 2018 at 11:34 am

In the M code for the “Promote Headers” step, try changing the second parameter [PromoteAllScalars=true] to [PromoteAllScalars=false]. That will prevent the column header from becoming to the date’s value and then you can manually rename the column to “Date”. Check out Microsoft’s reference on this function for more information: https://msdn.microsoft.com/en-us/library/mt260804.aspx.

March 14, 2018 at 11:32 am

I have another question; during the Promote Header; I don’t want to rename the existing one or more Column Name? Please advise

March 22, 2018 at 11:48 am

Can you clarify what you’re asking for here?

April 4, 2018 at 6:01 pm

Thanks Narayana; your reply on Promating header with date is very useful. I have any case as below; Please advise.

Present Condition of the Query is as below:

Reference, Date , Column2, Column 3, Column 4

R001, 28/3/18, Product No, Product, Catalog
R001, 28/3/18, 10AXX10, Chairs, Soni1
R001, 27/3/18, 10AXX10, Tables , Riser1

Output

Reference, Date , Product No, Product, Catalog
R001, 28/3/18, 10AXX10, Chairs, Soni1
R001, 27/3/18, 10AXX10, Tables , Riser1

I am able achieve for Date based on your advise using PromoteAllScalars=false; How to do the same for “Reference” Column Header.

Thanks

April 5, 2018 at 6:31 pm

That’s a tough one and I don’t immediately see an obvious solution since the Reference column stores text values. I think this would require thorough analysis which is beyond the scope of this comment section.

April 17, 2018 at 8:50 pm

Thanks your reply;

March 23, 2018 at 9:45 am

How did you learn M? I am confused half the time when looking at it (and certainly cannot just figure it out on my own).

March 26, 2018 at 12:35 pm

I’d say I learned mostly by doing — using the out of the box transformations in PowerQuery (i.e. the buttons in the ribbon) and then seeing the code that is generated. In addition, I have experience with some other programming languages like Javascript, PHP, VBA, and SQL, which helps me understand the syntax and structure of the M formulas.

If you’re starting from scratch you might consider reading “M is for (Data) Monkey: A Guide to the M Language in Excel Power Query.”

Good luck!

August 3, 2018 at 4:57 pm

Hi Narayana,

Thanks for this post which was very thought provoking as I’m trying to make my PowerQueries more robust to changes in the source files generally!

I followed it through but didn’t find it intuitive so I challenged myself to streamline the process and apply the same steps to both rows and columns.

This is what I came up with and I’d be interested in your thoughts in case there is any downside to doing it this way….

Record.FieldValues(_) returns a list of the values in each field on the current row
List.Contains(list,”Region”) will then return true or false to indicate which rows

So I used 2 steps to do the bulk of the work :

CheckWanted = Table.AddColumn(#”Previous Step Name”, “Wanted”, each List.Contains(record.FieldValues(_),”Region”)

SkipNotWanted = Table.Skip(CheckWanted, each [Wanted] true)

Then transpose the table and repeat the same process to deal with the columns (which are now rows)

Then transpose back again.

It seems to work in less steps and I prefer it because the process for handling extra columns is identical to the process for handling extra rows.

Also the List.Contains is looking for an EXACT match on “Region” so it avoids the problems if another record contains the text Region1 for instance which would be an issue for Table.FindText

Are there any downsides to doing it this way because so far it seems to be working just fine.

Thanks

August 3, 2018 at 5:13 pm

I think posting has stripped out some characters it thought were HTML !

The SkipNotWanted checks each [Wanted] IS NOT EQUAL TO true

but its done with “less than” and “greater than” which have been stripped out of the posted comment

August 29, 2018 at 9:11 pm

Hi Mike, that’s a very elegant solution! I updated the sample PBIX file with your approach and added a note in the article to that effect so that others can benefit from your idea. Great work!

August 2, 2019 at 10:31 am

Hi Narayana –

Awesome post, and thanks for editing to include Mike Ward’s solution!

I needed to modify it a little because my table has 16k+ rows. Power Query only supports 16,384 columns, so the transpose step will fail. Here’s the logic I followed instead:

1) Remove leading rows just as Mike did.
2) Keep just the header rows with Table.FirstN(table, 1)
3) Transpose THIS table, since a 1 row table will now just be 1 column
4) Remove the leading “rows” again, just as Mike did
5) Count how many “rows” were removed…do this with N = Table.RowCount(before removal) – Table.RowCount(after removal)
6) Remove the first N columns from the resulting table in Step 2…a la Datachant’s method mentioned here: (remove the first N elements of the list of column names)
https://datachant.com/2017/01/24/power-bi-pitfall-5/
7) Promote headers

Here’s the M code. Note that I stored the value to search for as a step so that it’s “change once, apply many”:

let
Source =
,Starting_Value = “Label” // change as needed

// these first 3 steps are the same as Mike Ward’s solution
,Add_HeaderCheck_ROW = Table.AddColumn(Source, “Wanted”, each
List.Contains(Record.FieldValues(_), Starting_Value)),
Skip_RowsAboveHeader_ROW = Table.Skip(Add_HeaderCheck_ROW, each
[Wanted] true)
,Remove_HeaderCheck_ROW = Table.RemoveColumns(Skip_RowsAboveHeader_ROW
,{“Wanted”})

// keep only the first row to avoid transpose errors for large tables
,FirstRow_forColumns = Table.FirstN(Remove_HeaderCheck_ROW, 1)
,Transpose = Table.Transpose(FirstRow_forColumns)

// these 2 steps are the same as Mike Ward’s solution
,Add_HeaderCheck_COL = Table.AddColumn(Transpose, “Wanted”, each
List.Contains(Record.FieldValues(_), Starting_Value))
,Skip_RowsAboveHeader_COL = Table.Skip(Add_HeaderCheck_COL, each
[Wanted] true)

// count how many rows the previous step removed…this is how many columns to remove from the un-transposed table
,Rows_Removed = Table.RowCount(Add_HeaderCheck_COL) – Table.RowCount(Skip_RowsAboveHeader_COL)

// uses Datachant’s method of removing the first N columns from a table…without referencing column names
,Remove_NColumns = Table.RemoveColumns(Remove_HeaderCheck_ROW
,List.FirstN( Table.ColumnNames(Remove_HeaderCheck_ROW), Rows_Removed) )
,Headers = Table.PromoteHeaders(Remove_NColumns, [PromoteAllScalars=true])
in
Headers

August 2, 2019 at 1:42 pm

Glad you found my solution helpful 🙂

Thanks for sharing your additional layer of complexity which is bound to help someone else.

It’s good to share 🙂

August 20, 2019 at 7:21 am

Hello, I am new to Power query & trying to apply your code. How ever I am getting en error as below

Expression.Error: The name ‘Added ColumnName’ wasn’t recognized. Make sure it’s spelled correctly.

The column name is correct in previous step, I am not getting why this throws this error

Chayanika

August 26, 2019 at 1:22 pm

Can you please let me know which sample file you’re using as well as the query and applied step returning the error?

March 16, 2023 at 6:09 am

I tried to implement your solution, but didn’t get every step. But you made me think and I think I have found a much shorter solution. Not as elaborated but I think it’s neat.
Coming from my normal Excel heritage I thought you create a custom column and checking whether the value in the reference column equals the title name and secondly checking if a previous row was the title row.
eg: =OR([@Column1]=”Region”,$B1) ($B1 is the column of the check, so $B1 is just the previous row).
Since referencing to other rows in PowerQuery is suboptimal, i remembered the the command Table.FillDown. Important for FillDown the folling rows have to be null.
Resulting in this code:
#”+Column: Relevant” = Table.AddColumn(#”Sheet”, “Relevant”, each if [Column1] = “Region” then true else null),
#”FillDown: Relevant” = Table.FillDown(#”+Column: Relevant”,{“Relevant”}),
#”-Rows: Filter: Relevant” = Table.SelectRows(#”FillDown: Relevant”, each ([Relevant] = true)),
#”-Rows: Obsolete” = Table.RemoveColumns(#”-Zeilen: Filter: Relevant”,{“Relevant”}),
#”Rows: Title” = Table.FirstN(#”-Rows: Obsolete”,1)

Power Query (M)agic: Dynamically remove leading rows AND columns

The scenario – source data sheets that are structured differently!

STEP 1: IDENTIFY THE TARGET HEADER ROW AND COLUMN:

STEP 2: LOCATE THE HEADER ROW DYNAMICALLY:

STEP 3: DETECT THE COLUMNS TO DELETE

STEP 4: REMOVE THE ORIGINAL EXTRA COLUMNS AND ROWS

STEP 5: Convert magic query to function

Conclusion:

Come for the Techniques, Stay for the Business Value!

Cancel reply