Conditional Joins Banner update

Conditional Joins Banner update

Business Problem:

In a recent project with one of our clients, we were attempting to categorize website visit logs, based on the visitor’s landing page. The client had been maintaining a table of KeyPhrases that correspond with Categories. If the Visitor’s landing page contained the KeyPhrase, we wanted to match it to a category.

We needed to match two tables when the text value in the first table contains the text value from the second table. Essentially, we wanted to Join two tables together but didn’t want to use an exact match on field values. We wanted to use some type of logic. We didn’t want to use fuzzy merge, because we needed to be explicit with our business logic.

The secret to this technique revolves around a pretty simple pattern using the Table.SelectRows() function.

You can download a copy of this workbook to follow along with here.

The process we’re going to implement is as follows:

  1. Create a Custom Column on our DataTable
  2. Use the Table.SelectColumns() function in that Custom Column
  3. Implement logic with that function to return a table or list object of values from our LookupTable

Here’s the Pattern:

Table.SelectRows( LookupTable , (TableParameter) => your true / false logic using TableParameter[YourField] )

What’s going on here!? Get ready, we’re about to dive deep into the syntactical waters of the M language!

The syntax for this function is as follows:

Table.SelectRows( table as table, condition as function ) as table

Filtering a table in PowerQuery generates a Table.SelectRows() function for that Applied Step. You’ve probably seen the each keyword in the resulting code, such as:

= Table.SelectRows(Source, each [VisitID] < 100)

each is the Syntactical Sugar for the following parameter declaration and goes-to operator: (_) =>

The previous function is equivalent to the following:

= Table.SelectRows(Source, (_) => [VisitId] < 100)

So, what’s happening is the table argument Source in the Table.SelectRows() is being declared as a parameter that goes to the following condition as function. Instead of the Syntactical Sugar, we will replace each with (TableParameter) =>


The explicit declaration of the parameter allows us to use the field access operators [] to get a specific column from TableParameter, and to avoid errors from object type conversion and circular dependency.

Let’s put it to the test!

We have two tables; Visits and Categories. Our Visits table is a log of visits to the P3 Adaptive website. Our Categories table has a KeyWord column associated with a Category.

Visits Table:

Visits Table

Categories Table:

Categories Table

We want to associate a visitors LandingPage with a Category, based on the KeyPhrase. We could do a Merge Queries on the URL, Expand the KeyPhrase column, and write a custom column with Text.Contains(), and then filter the list. For our client’s actual use case, we would have needed many iterations of Merging, Custom Columns, and Filtering.

We’d rather do this in one step!

Add a custom column to the Visits Table and use the following code:

Table.SelectRows( Categories , (Magic) => Text.Contains( [LandingPage] , Magic[KeyPhrase] ) )

Custom Column

Expand the VisitCategory Column from the Custom Column

Expand Visit Category

Visit Category

WHAT!!!!!

The (M)agic is that for every row in Visits, we filter Categories to where the LandingPage contains the KeyPhrase. The result is a TableObject with only the matching rows from Categories. This is similar to row context transition in DAX, using the EARLIER() function! Note: The name of the parameter does not affect the value of the Parameter, it will inherit the previously declared table in the function. In this example, categories. You DO need to use the name of the parameter in conjunction with a field access operator []. Magic[KeyPhrase], in this example.

Bonus Points, Returning a List Object instead of Table

When using this method, I prefer to return a List Object to my custom column, as opposed to the Table Object. To do this, I’ll make two alterations to the custom column’s code.

First, I use the field access operators [] after my Table.SelectRows function to indicate the column I want to return. I’m only interested in the VisitCategory column and do not have a need to return the entire table in this case. Such as Table.SelectRows(…)[VisitCategory]

Second, I’ll wrap my Table.SelectRows() function with the List.Distinct() function. This will return a list with no duplicates for VisitCategory. A LandingPage may have multiple KeyPhrases for the same Category and there is no value in listing them more than once.

Here’s the final custom column function.

VisitID

With a List Object, I can choose either to “Expand to New Rows” OR “Extract Values”

Expand to new rows

Expand to New Rows will give a result similar to a merge, where I will list a Visit on more than one row if it matches more than one Category.

list visit

Extract Values will retain only one row per Visit, concatenating all the matches into a single text string with a delimiter of your choice.

extract vlues

dax power query

The decision here depends on your use case.

Final Speed Tweak, Table.Buffer()

Shout Out to Chris Webb and his [link removed due to 404] post on using DAX Studio and SQL Server Profiler to get durations for your data model refresh. After we had our custom column solution implemented, while it was faster than the previous approach, we were still underwhelmed with the refresh performance of our Data Model.  The solution here was to buffer a copy of the Categories table within our Visits query. We then used the buffered table in the Table.SelectRows() function in place of the Categories table.

Here’s the M code for our final Visits Table Query.

let

Source = WorkbookConnection,

Visits = Source{[Name="Visits"]}[Content],

ChangeTypes =

Table.TransformColumnTypes(

Visits,{

{"VisitID", Int64.Type}, {"BaseURL", type text}, {"LandingPage", type text}

}

),

BufferCategoryTable = Table.Buffer(Categories),

AddMagicColumn =

Table.AddColumn(

ChangeTypes, "Category",

each

List.Distinct(

Table.SelectRows(

BufferCategoryTable, (Magic) =>

Text.Contains( [LandingPage] , Magic[KeyPhrase] )

)[VisitCategory]

)

),

ExtractValuesAsList =

Table.TransformColumns( AddMagicColumn, {"Category", each Text.Combine(List.Transform(_, Text.From), ","), type text})

in

ExtractValuesAsList

We saw an immediate and dramatic improvement in refresh speed of the Visits Table.

Wrapping it up:

This pattern will work with any logic. Comparing text, numbers, dates, anything that can resolve to a TRUE or FALSE answer. Think about using this solution where you currently perform the following sequence of steps:

  1. Merge Queries
  2. Expand Column(s)
  3. Add Custom Column(s)
  4. Filter Custom Column(s)

Using logic more complex than matching values to associate datasets together is a common task. This pattern opens many possibilities and is easy to learn. How do you plan to incorporate this technique? Have you used a different approach?

Forget bending spoons with your mind – there’s no money in it.

It takes a special kind of mindset to “bend” data (and software!) to the human will.  As this article demonstrates, we at P3 Adaptive can twist Power BI into a pretzel if that’s what an organization needs. (A robust, trustworthy, industrial-strength pretzel of course).

The data-oriented challenges facing your business require BOTH a nimble toolset like Power BI AND a nimble mindset to go with it. And as Val Kilmer / Doc Holladay once said, we’re your huckleberry.

Connect with the Experts