skip to Main Content

Conditional Joins Banner update

Business Problem:

In a recent project with one of our clients, we were attempting to categorize website visit logs, based on the visitor’s landing page. The client had been maintaining a table of KeyPhrases that correspond with Categories. If the Visitor’s landing page contained the KeyPhrase, we wanted to match it to a category.

We needed to match two tables when the text value in the first table contains the text value from the second table. Essentially, we wanted to Join two tables together but didn’t want to use an exact match on field values. We wanted to use some type of logic. We didn’t want to use fuzzy merge, because we needed to be explicit with our business logic.

The secret to this technique revolves around a pretty simple pattern using the Table.SelectRows() function.

You can download a copy of this workbook to follow along with here.

The process we’re going to implement is as follows:

  1. Create a Custom Column on our DataTable
  2. Use the Table.SelectColumns() function in that Custom Column
  3. Implement logic with that function to return a table or list object of values from our LookupTable

Here’s the Pattern:

Table.SelectRows( LookupTable , (TableParameter) => your true / false logic using TableParameter[YourField] )

What’s going on here!? Get ready, we’re about to dive deep into the syntactical waters of the M language!

The syntax for this function is as follows:

Table.SelectRows( table as table, condition as function ) as table

Filtering a table in PowerQuery generates a Table.SelectRows() function for that Applied Step. You’ve probably seen the each keyword in the resulting code, such as:

= Table.SelectRows(Source, each [VisitID] < 100)

each is the Syntactical Sugar for the following parameter declaration and goes-to operator: (_) =>

The previous function is equivalent to the following:

= Table.SelectRows(Source, (_) => [VisitId] < 100)

So, what’s happening is the table argument Source in the Table.SelectRows() is being declared as a parameter that goes to the following condition as function. Instead of the Syntactical Sugar, we will replace each with (TableParameter) =>


The explicit declaration of the parameter allows us to use the field access operators [] to get a specific column from TableParameter, and to avoid errors from object type conversion and circular dependency.

Let’s put it to the test!

We have two tables; Visits and Categories. Our Visits table is a log of visits to the p3 website. Our Categories table has a KeyWord column associated with a Category.

Visits Table:

Visits Table

Categories Table:

Categories Table

We want to associate a visitors LandingPage with a Category, based on the KeyPhrase. We could do a Merge Queries on the URL, Expand the KeyPhrase column, and write a custom column with Text.Contains(), and then filter the list. For our client’s actual use case, we would have needed many iterations of Merging, Custom Columns, and Filtering.

We’d rather do this in one step!

Add a custom column to the Visits Table and use the following code:

Table.SelectRows( Categories , (Magic) => Text.Contains( [LandingPage] , Magic[KeyPhrase] ) )

Custom Column

Expand the VisitCategory Column from the Custom Column

Expand Visit Category

Visit Category

WHAT!!!!!

The (M)agic is that for every row in Visits, we filter Categories to where the LandingPage contains the KeyPhrase. The result is a TableObject with only the matching rows from Categories. This is similar to row context transition in DAX, using the EARLIER() function! Note: The name of the parameter does not affect the value of the Parameter, it will inherit the previously declared table in the function. In this example, categories. You DO need to use the name of the parameter in conjunction with a field access operator []. Magic[KeyPhrase], in this example.

Bonus Points, Returning a List Object instead of Table

When using this method, I prefer to return a List Object to my custom column, as opposed to the Table Object. To do this, I’ll make two alterations to the custom column’s code.

First, I use the field access operators [] after my Table.SelectRows function to indicate the column I want to return. I’m only interested in the VisitCategory column and do not have a need to return the entire table in this case. Such as Table.SelectRows(…)[VisitCategory]

Second, I’ll wrap my Table.SelectRows() function with the List.Distinct() function. This will return a list with no duplicates for VisitCategory. A LandingPage may have multiple KeyPhrases for the same Category and there is no value in listing them more than once.

Here’s the final custom column function.

VisitID

With a List Object, I can choose either to “Expand to New Rows” OR “Extract Values”

Expand to new rows

Expand to New Rows will give a result similar to a merge, where I will list a Visit on more than one row if it matches more than one Category.

list visit

Extract Values will retain only one row per Visit, concatenating all the matches into a single text string with a delimiter of your choice.

extract vlues

dax power query

The decision here depends on your use case.

Final Speed Tweak, Table.Buffer()

Shout Out to Chris Webb and his [link removed due to 404] post on using DAX Studio and SQL Server Profiler to get durations for your data model refresh. After we had our custom column solution implemented, while it was faster than the previous approach, we were still underwhelmed with the refresh performance of our Data Model.  The solution here was to buffer a copy of the Categories table within our Visits query. We then used the buffered table in the Table.SelectRows() function in place of the Categories table.

Here’s the M code for our final Visits Table Query.

let

Source = WorkbookConnection,

Visits = Source{[Name="Visits"]}[Content],

ChangeTypes =

Table.TransformColumnTypes(

Visits,{

{"VisitID", Int64.Type}, {"BaseURL", type text}, {"LandingPage", type text}

}

),

BufferCategoryTable = Table.Buffer(Categories),

AddMagicColumn =

Table.AddColumn(

ChangeTypes, "Category",

each

List.Distinct(

Table.SelectRows(

BufferCategoryTable, (Magic) =>

Text.Contains( [LandingPage] , Magic[KeyPhrase] )

)[VisitCategory]

)

),

ExtractValuesAsList =

Table.TransformColumns( AddMagicColumn, {"Category", each Text.Combine(List.Transform(_, Text.From), ","), type text})

in

ExtractValuesAsList

We saw an immediate and dramatic improvement in refresh speed of the Visits Table.

Wrapping it up:

This pattern will work with any logic. Comparing text, numbers, dates, anything that can resolve to a TRUE or FALSE answer. Think about using this solution where you currently perform the following sequence of steps:

  1. Merge Queries
  2. Expand Column(s)
  3. Add Custom Column(s)
  4. Filter Custom Column(s)

Using logic more complex than matching values to associate datasets together is a common task. This pattern opens many possibilities and is easy to learn. How do you plan to incorporate this technique? Have you used a different approach?

Forget bending spoons with your mind – there’s no money in it.

It takes a special kind of mindset to “bend” data (and software!) to the human will.  As this article demonstrates, we at PowerPivotPro can twist Power BI into a pretzel if that’s what an organization needs. (A robust, trustworthy, industrial-strength pretzel of course).

The data-oriented challenges facing your business require BOTH a nimble toolset like Power BI AND a nimble mindset to go with it. And as Val Kilmer / Doc Holladay once said, we’re your huckleberry.

Connect with the Experts

Justin Mannhardt

Justin Mannhardt is a Principal Consultant with PowerPivotPro. He loves all things data, from ETL to Dashboards. Whether it's small turbocharged DAX and M victories or taking on enterprise-scale PowerBI deployments, he's sure to have a good time doing it. When he's not having fun doing data things, you'll find him grilling, mountain-biking, and saving the world from monsters with his 2 sons.

This Post Has 18 Comments
  1. Justin, job well done. I’m working through Collect, Combine, Transform book right now and this post syncs well with Chap 4 exercises.

  2. This is really cool Justin. Thank you for the “Wrapping it up” section. I was impressed with the pattern, but was trying to think of how I could use this.

    Then I realized I have a “merge, expand, add custom, filter said custom” pattern that can only be done in M (creating a dimCustomer table from 2 separate databases).

    This is exactly what I need.

    Thanks again!

    1. That’s great, Christopher! I think this is also a great approach when you can maintain changes in business logic just by maintaining the source table(s), rather than updating the filters in PQ.

  3. Really don’t understand your mindtextual ad algorithm by timing this post 🙂 I tried to categorize bank statement transactions based on keywords in Notes column last days and was able to end up only with kind of hard coded nested if conditions. And then bump and it works within one single row of M code using your approach.
    Thanks a lot!

  4. amazing post Justin, could you please elaborate more of that Magic paramter you have used instead of EACH? Is there a reference somewhere where I can ready more about it? I also have books from Chris Webb, Ken Pulse and Gil on Power Query (does these books have reference to this in them?)

    1. Hello, Prashant! Thanks for asking. Essentially, solving for ambiguity. The primary purpose of this use case is to project our Categories Table into Visits and to explicitly define tables and fields in the Table.SelectRows() function. A few things would occur if we replaced (Magic) => with EACH on this example. First, we would get an Expression.Error, as the function would be first seeking the [LandingPage] field in the Categories table, which doesn’t exist.

      While EACH and (_) => are syntactically equivalent, the behavior at runtime is not necessarily equivalent, dependent on the use. This is not very well documented, at least wholistically. The Power Query Formula Language Specification covers the main concepts involved. I’d recommend starting with section 6.4.2 Field Access. Ken has a great guest post that digs in deeper on the each keyword as well: https://www.excelguru.ca/blog/2018/01/09/each-keyword-power-query/

  5. Hi Justin, nice solution. How would you add one additional category “Other” containing all other records not having one of the keyphrases? Thanks

    1. Hello! I think the simplest approach here would be to add a step after you’ve expanded the category column to replace null and blanks with “Other”.

  6. Hi Justin,
    What are your thoughts on putting the table.buffer at the end of the Categories query. Table.Buffer(#”Changed Type”). Your Visits query then has one less step in that the AddMagicColumn now just references Categories instead of BufferCategoryTable. The change makes your Visits query more sequential.
    Regards,
    Mike

  7. Hi Justin,

    Question, how would you expand this to iterate over multiple columns? I’ve already modified this to check the cell value vs the text.contains and only return the first value in the list

    Value.Equals( [LandingPage] , Magic[KeyPhrase] )

    Table.TransformColumns ( AddMagicColumn, {“Category”, each List.First(_), Int64.Type})

    but I have a column of data that is, from a data model perspective, a mess. There’s no common delimiter between the various strings in the source column so I’m currently replacing all of the different delimiters with a ^ and then splitting the column into six separate columns. I need to be able to compare the value in each of the rows & split columns and have one value in the resulting new “Category” column

Leave a Comment or Question