Data preparation has become so easy and potent in Excel and Power BI. The user experience of the Query Editor of Power BI and Excel (Power Query Add-In, or Get & Transform in Excel 2016) is extremely rewarding. It changes the crucial yet tedious and repetitive tasks into an exciting and even fun activity.  Just as important, it’s a process which produces an automated pipeline for your ever-changing reporting requirements. Thanks to Power Query, the modern data wrangler can become a Self-Service ETL power house. As you move up on your journey to become a super data wrangler, it’s time to be aware of the most common mistakes you do in Power Query, and how to avoid them.

As you build your query using the Query Editor, each transformation step generates a formula, or a line of code. The sequence of formulas are generated as a script (In the language which is unofficially called “M”, or Power Query Formula Language) which allows Excel and Power BI to follow the transformation steps that you’ve defined in the user interface, and automate it whenever you refresh your report.  While it’s not necessary to learn M to achieve 80% of your data transformations, it is extremely useful to be aware of it, and become confident enough to apply some lightweight touches to the formula and avoid common mistakes.

The common mistakes, which I also call Pitfalls, are almost rooted in the basic design of Power Query – by designing a system so robust and complete, they inadvertently left a few of these “pits” lurking. Only practice, awareness and education will take you out of the pitfalls, and train you to avoid them in the future.

The Ten Pitfalls of the Data Wrangler in Power BI and Power Query in Excel

 

The Root cause:  Preview

So, what are those pitfalls, and why do we they exist? Data transformation in Power Query is always based on a snapshot of the data, and usually on a limited preview of it (e.g. 200 first rows). Each step you apply in the UI of the Query Editor is converted into a formula, which is heavily dependent on the preview/snapshot format . However, when real data starts deviating from the snapshot data, your queries will fail to refresh, or even worse, lead to incomplete or invalid data in your reports, which may ultimately lead to data-biased business decisions.

I’ve recently completed a series of blog posts here, that walk you through each of the 10 pitfalls, and I thought it would be best to wrap up this series here on PowerPivotPro.com. So let’s quickly go over the 10 Pitfalls in Power Query, and allow you to drill down to the original posts, whenever you are ready to learn it in more details, and avoid those pitfall.

 

Pitfall #1 – Your formula bar is inactive

If you don’t know what is the formula bar, that is alright. Open the Query Editor and check the Formula Bar checkbox in View tab. From now on, you will always have the formula bar visible. You can also enable the Formula Bar in the File tab –> Options & Settings –> Options –> Global –> Query Editor.

The Ten Pitfalls of the Data Wrangler in Power BI and Power Query in Excel

Why is it so important to activate the formula bar? The formula bar will be your best guide that warns you when you reach a pitfall. By getting to know what to look for in the formula bar, even without knowing the formula syntax itself, you will be able to understand the weaknesses of your queries, and to be better equipped to fix them when they fail.

Learn more here.

Pitfall #2 – Auto-generated Changed Types step

Whenever you import a table, or perform steps like split columns, an auto-generated step will decide the column types for you, to ease your data cleansing experience. But this step contains a formula that refers to all the column names. When one of the columns will be renamed or get deleted in your external data source, you will have a refresh error. It’s nice to get a free lunch. But free meals are not always nutritious.

The Ten Pitfalls of the Data Wrangler in Power BI and Power Query in Excel

The auto-generated Changed Type step is the #1 catalyst for refresh errors. You will do better by deleting this step, or tweak the formula to address only the crucial columns. Learn more here.

Pitfall #3 – Include vs. Exclude Filters

Whenever you apply a filter in the Query Editor, I encourage you to check the resulting formula bar and ensure the filter was applied on the correct values – Especially when you want to exclude values.

Let’s demonstrate it. Imagine, we would like to exclude all the rows with 1100 in Column 2 as highlighted below.

The Ten Pitfalls of the Data Wrangler in Power BI and Power Query in Excel

In the Query Editor, we click the filter control on the right side of Column 2’s header, and start searching for values that starts with “11”. Two values pop up: 1100 and 1105.

The Ten Pitfalls of the Data Wrangler in Power BI and Power Query in Excel

Now, we uncheck the value 1100, and click OK in the filter pane, expecting that it will filter out all the rows with 1100 in Column 2.

The Ten Pitfalls of the Data Wrangler in Power BI and Power Query in Excel

Surprisingly, we find out that the filter logic was automatically tuned to include only values of 1150 in Column 2. Here is where Pitfall #3 happens. We can easily miss the wrong logic, and have the wrong data being propagated into our reports.

The Ten Pitfalls of the Data Wrangler in Power BI and Power Query in Excel

To fix the wrong logic, you can change the formula from:

[Column 2] = 1150

To:

[Column 2] <> 1100

The Ten Pitfalls of the Data Wrangler in Power BI and Power Query in Excel

To learn more about the third pitfall, go here.


Interested in Learning How to Do this Kind of Thing?

power query

Our New Online Power Query Course just launched this year. You’ll learn Power Query from the best in the business, two Excel MVPs Ken Puls and Miguel Escobar. We’ve included 7 Hours of Self-Paced Video over 31 Courses, with lots of example files, and a Digital Badge powered by Acclaim.

CLICK HERE to claim your spot.


 Pitfall #4 – Column Reordering

Reordering columns in the Query Editor is a tricky business. You often do it for ad-hoc debugging purposes, unaware of its weaknesses. The reordering creates a formula that refers to all the columns in the table. When one of the column is missing, the refresh will fail. To avoid this pitfall, and reorder a subset of the columns, you can apply an advanced query function. Learn more here.

The Ten Pitfalls of the Data Wrangler in Power BI and Power Query in Excel

 

Pitfall #5 – Removing and Selecting Column

Often times you remove unnecessary columns from your table using the Query Editor, instead of selecting the other columns you need to keep. To avoid refresh errors, when specific column names are missing, you can change the formula to refer to the column’s position instead of its name.

Instead of selecting the first two columns by explicitly refer to their names:

= Table.SelectColumns(Source,{“Column1”, “Column2”})

We can change the formula above to select the first two columns in a dynamic way, which will not fail when the column names change:

= Table.SelectColumns
(
Source,
List.FirstN(Table.ColumnNames(Source), 2)
)

Learn more here.

 

Pitfall #6 – Column Renaming

You import a table to Excel using the Query Editor, and find some cryptic column names, that call for an urgent rename. But hold on! Do you have a solid “Data Contract” with the owner of the external data source? Can you ensure that the columns will keep their current names? When you rename the columns in the Query Editor, you expose the query to a refresh failure that will surface when the column names in the external data source will change.

The Ten Pitfalls of the Data Wrangler in Power BI and Power Query in Excel

In this article you can learn how to apply the query function below to rename columns according to their position in the table:

(Source as table, ColumnNamesNew as list, Indices as list) =>

let

    ColumnNamesOld = List.Transform( Indices, each Table.ColumnNames(Source){_} ),

    ZippedList = List.Zip( { ColumnNamesOld, ColumnNamesNew } ),

    #”Renamed Columns” = Table.RenameColumns( Source, ZippedList )

in

    #”Renamed Columns”

 

Pitfall #7 – Split Columns By Delimiter

This pitfall is one of my favorites. When you split columns by a delimiter, the Query Editor creates new columns to accommodate the split values

The Ten Pitfalls of the Data Wrangler in Power BI and Power Query in Excel

Unfortunately, the number of new columns, that is needed for the split, is decided during the evaluation of the preview data. When the external data will change, lots of crucial information may get lost. In the screenshot below, you can see an example of a table with participants in events. Each event is represented in a single row, with comma-separated participants in the third column. When I created the query, there were no more than three participants in an event. As a result, the query split the column Participants into three columns. Later, when the data changed, I completely lost the fourth participant end beyond in events with more than 3 participants. Read more here to find how to build the Better and Best queries.

The Ten Pitfalls of the Data Wrangler in Power BI and Power Query in Excel

 

Pitfall #8 – Merge Columns

The reverse transformation of Split Column is less popular, but is quite handy, and simple to use. I often use it when I need to unpivot multiple columns in a nested table (For unpivot of nested table, read more here).

The Ten Pitfalls of the Data Wrangler in Power BI and Power Query in Excel

The pitfall lies in the fact that the merge step creates a complicated formula which will fail when the column names will change.

The Ten Pitfalls of the Data Wrangler in Power BI and Power Query in Excel

In case you need to merge columns according to their position, and not by their names, you can follow this article to learn how.

 

Pitfall #9 – Expand Table Columns

Expand Table Column is a common transformation step that you can take advantage of when you merge between two queries, or import JSON or XML datasets. The Expand Table Column step occurs when you click on the highlighted button:

The Ten Pitfalls of the Data Wrangler in Power BI and Power Query in Excel

Expanding the Table columns activates the expand pane, which allows you to select the columns to expand. Unfortunately, you need to define the columns, and these columns are hard-coded into the formula.

The Ten Pitfalls of the Data Wrangler in Power BI and Power Query in Excel

Imagine we want to expand all the columns except Employee ID. When we uncheck Employee ID in the pane above, we get the following formula:

= Table.ExpandTableColumn(#”Merged Queries”, “NewColumn”, {“First Name”, “Last Name”, “Gender”}, {“First Name”, “Last Name”, “Gender”})

The columns First Name, Last Name and Gender are hard-coded, and this is our pitfall. If we need to extract other columns which are not presented in the preview, can we extract them dynamically? The answer is yes.

= Table.ExpandTableColumn(#”Merged Queries”, “NewColumn”, List.Difference(Table.ColumnNames(Employees), {“Employee ID”}))

Learn more here.

Pitfall #10 – Remove Duplicates and Lookup Tables

Last but not least, the tenth pitfall explains how to effectively create lookup table using the Remove Duplicates step in the Query Editor. How many times, did you create a look table using the Remove Duplicates in Power Query? As a lookup table requires a One-To-Many relationship with the relevant column in your fact table, it is imperative that you keep your values unique. Apparently, and this is why the tenth pitfall is so common, the Query engine and Power Pivot don’t share the same logic with regards to unique values. So removing duplicates is not sufficient to avoid the pitfall and avoid refresh error. Read more here.

 

Conclusions

Avoiding the ten pitfalls is a crucial step in becoming a successful data wrangler – One that can deliver production-ready robust queries. You can now follow the Pitfalls series whenever you encounter refresh errors or unexpected results in one of the 9 transformations we have described in this series. Creating robust queries that won’t fail you on the next refresh, can be easily achieved. And, you don’t really need to master M to avoid the pitfalls, and gain robust queries.

Hope you enjoy this series. Looking forward to learning about other pitfalls you have encountered.

  Subscribe to PowerPivotPro!
X

Subscribe

Gil Raviv

Principal Consultant at PowerPivotPro and a blogger at DataChant.com. Former Senior Program Manager on Microsoft Excel team. Led Power Query integration in Excel 2016. 

This Post Has 27 Comments

  1. Such an awesome post Gil! I’ve been following this on your blog, and you’ve systematically made my queries much more bulletproof. List.Union and Liat.Transform combo is POWERFUL.

    Thanks again for sharing your M expertise!

  2. Thanks for sharing. As you mention in your article, it is not difficult to author robust queries– The only challenge is being prepared to do it. This post is a great way to prepare.

  3. Wow. I dropped a project I was working on a few weeks ago due to refresh errors (column name changes in source data). I’ll be dusting that back off now.

  4. Awesome post! I had the changed type problem a few times and it took a few minutes for me to figure out that it was Power BI changing the type not me. Thanks! Now I feel like I have the keys to the city – or the rules for creating very useful templates.

    Great job!

  5. Ten alligators! My Pitfall guy has never had to swing over more than three at a time. Very daunting.

    Thank you for this post. I know I’ll be referring back to it and the underlying posts quite a bit in the future!

  6. Great post. It would have saved me a lot of headaches in the last few months and I’m sure that it will help me in the future, especiallt #7 and #10 that were still a mistery to me. Thanks.

  7. A major pitfall not covered here which affects me is the Path to Source files, e.g.

    = Csv.Document(File.Contents(“E:\OneDrive\Company\Sub Folder\SO_Info.csv”),[Delimiter=”,”, Columns=6, Encoding=1252, QuoteStyle=QuoteStyle.None])

    Why can we not use the Current folder, rather than using the Full File Path??

      1. Regarding Pitfall #4. Could a function be used to perform an action on every column OR on each column in turn using so that no matter what each Column is named the number could be used. In my case an action such as FillDown.

        1. To fill down multiple columns, there is no need to perform iteration. Assuming you have the offset (a zero-based index of the first column), and the count, you can apply this code:
          FilledDownColumn = List.Range(Table.ColumnNames(Source), offset, count),
          FilledDown = Table.FillDown(Source, FilledDownColums)

          1. Many thanks for the Tip. I have used your code, but nested together to form:
            #”Filled Down” = Table.FillDown(#”Previous Line”,List.Range(Table.ColumnNames(#”Previous Line”), 1))
            Looking forward to future Tips on your site.

  8. Great article. Some of these I have run into and figured out on my own; others i didn’t know about and great to have a heads up on. Thanks for writing it up.

  9. One I found useful in a planning tool that I created was the following, which checked if the Promo column exists, and created a null one if it did not.

    #”Column check” = if not(Table.HasColumns(#”Renamed Columns”, “Promo”)) then Table.AddColumn(#”Renamed Columns”,”Promo”, each null) else #”Renamed Columns”

    Since was possible that a sales person had no promotions, which after the depivot of their data, removing all the empty cells and repivoting to desired format

  10. HI Gill

    Thanks for sharing these tips.

    I used #9 to bring in all the columns when the columns can change. I couldn’t get your DataChant #9 solution (the one at the bottom of the post) to work using the List.Union – no doubt my fault in missing something,

    But I did get the exception code to work above to work – i just removed one of the columns i didn’t need, but it then it brings in all the other columns even if new ones are added – perfect.

    Thanks again.

    Neale

Leave a Comment or Question