By Dany Hoter

We always think about a data model with multiple tables and relationships between them as a big improvement over the common practice of combining tables using VLOOKUP expressions.
(Watch Video for more: PowerPivot Relationships are EASIER than VLOOKUP, not just faster)

I came across a customer request that forced me to admit that in some aspects a VLOOKUP is more flexible than relationships. To solve some specific scenarios we have to use modeling methods which are not elegant and not efficient.


VLOOKUP better than Relationships?

I hope that some of the readers will know to defend the cause of the data model and find a magic solution that is both elegant and efficient.

How can good old VLOOKUP be better?

A solution based on VLOOKUP is using a separate formula for every row, this is what makes it cumbersome, slow and error prone – right?

Yes but this is also what makes it more flexible in some cases.

Let’s describe some reasons why is it more flexible and give some examples:

VLOOKUP can use a different way to find the right row based on the input value

=IF(A5=””
, VLOOKUP (A6,lookuptable6,3,false)
, VLOOKUP (A5,lookuptable5,4,false))

If lookup value is empty in our table, there is no point in looking for it in the lookup table and instead a different lookup table is used and a different lookup value is searched for.

For example (download), if a state column is empty or contains “NA”, return a value from the countries table using the country as a search value.


VLOOKUP gives us the flexibility to lookup State or Country names

How would you solve this in a Power Pivot data model? Create a calculated column in one table that contains the state or the country and create a relationship with a table that contains both countries and states. Or let the relationship fail to find a matching row in the lookup table and apply the solution from the next paragraph.

The expression can make a decision on a row by row basis how to behave in cases the VLOOKUP doesn’t find a matching row.

IFERROR(VLOOKUP(A5,lookuptable5,3,false),”Other product”)

IFERROR(VLOOKUP(A5,lookuptable,3,false),A5)

IFERROR(VLOOKUP(A5,lookuptable5,3,false),VLOOKUP(A6,lookuptable6,4,false))

I can decide what to show for every column that is based on VLOOKUP in case it is not found.

In the first example I show a constant value.

In the second example I show the key itself

In the third case I perform another VLOOKUP if the first one fails.

What is the equivalent behavior of a relationship?

Any value that is coming from the 1 side of a relationship and is not found in the 1 side returns as blank().


(blank) values when using Lookup Tables
can cramp your style

This is not always friendly to the consumer of a report.

It may be expected that not all values in the many side are not found in the 1 side but blank is not always the best option to show instead.

What can we do to solve the problem

Exactly what we are trained not to do – create a calculated column in the many side.

Assume you have a column in the products table called [Product Name]. The table is on the 1 side of a relationship with the sales table.

You can create a column called [Product Name] in the sales table with the DAX expression

=IF(ISBLANK(RELATED(Product[Product Name])
,”Other Product”
,RELATED(Product[Product Name]))

Now we should hide the [Product Name] column from the products table and use the one in the sales table.

Elegant? – no

Efficient? – no

But this is the best solution I can find.

In the included Excel file I used both the VLOOKUP solution and the DAX solution in the same table

So I wouldn’t be seen stupid I ran this problem through one of the leading experts and he agreed that it is a real scenario so now I propose this as a challenge to both the experts and the team who owns the modeling experience.

Download All Files:
Example1 Example2 (Excel 2013)
Example (Excel 2010 Version)

  Subscribe to PowerPivotPro!
X

Subscribe

This Post Has 10 Comments

  1. Hi Dany! If I caught everything right… it’s a matter of data preparation. So why don’t you try Power Query to prepare your data before load it into model?

  2. I’m with Mer on this one.

    One of my recent problems was pulling out a large number of look up tables then referencing them back into the sales table. As you know that can get problematic with loading to data model modes. I struggled for some time with how to do it and do it right and finally was able to figure it out a few days ago.

    I pulled out my look up tables, indexed them, then merged my queries in PQ to find look ups to the values left in the columns I wanted to remove in the sales table. After finding the correct value, I deleted my unwanted column and was left with an index. Previously, I was using VLOOKUPs inside the tables in excel, which resulted in plenty of headaches when it got to the data model.

    Using PQ you can probably just Replace all of the (blank) fields with Other Products, which eliminates having to do DAX wizardry.

  3. Dany, I found your post interesting. Since I was never an excel power user until power pivot and power query came along, 🙂 I never used vlookup much and so not as skilled in its use. I really had to think through your post. I’m more from the programmer/database side and so I’ve really taken to the relational data model. Basically I’m used to cleaning data. So personally I have to agree with Mer. In situations like this I would go straight to power query.

    And now with Power query in PowerBI Designer being the front door data load and DAX being surfaced for the analysis and report building, shaping the data first with PQ before it even gets to the data model seems to be intuitive…to me. 🙂 With that said, Excel being so flexible there is usually more than one solution. That’s the strength of Excel. From a casual user of vlookup I would have not approached the problem from your angle. It is a new way for me to look at the issue you presented. Thanks for the post!

  4. Hey Dany, thanks for the post. I also agree with the Power Query comments above. I attended Chris Webb’s excellent Power Query session at PASS BA Conference this week, and Chris opened my eyes to Power Query as a tool to alert the data steward to problems in the source data via audit reports and queries. This is a good example of a problem that could use those techniques, and it would be easy to create replacement product code and description columns filled with the missing values. I tend to favour “unknown product” over “other product” as it sends a message that something needs to be fixed rather than this is a “minor” product.

    If only you knew someone that worked on the Microsoft Excel development team, you could ask to get a default setting for “blanks” as a configurable attribute for each lookup table 🙂

  5. Hi Dany, I agree with the previous contributions re use of Power Query to clean the data before you load to the model. If however, you are presented with a fate accompli and the offending data is already in your model, you can use a simple DAX Report to help solve the problem. This technique has been demonstrated in the past by Marco Russo & Alberto Ferrari.

    Using Existing Connections / Table, select the Products table and it will be returned to Excel in a new Table. Change the name of this Table to Products2. Then right click on the table and select Table / Edit. Change the Command Type from Table to DAX. Replace the Product Table name with the following DAX expression:

    SUMMARIZE (
    Sales,
    Sales[Product],
    Products[Product Name],
    “New Product Name”, IF (ISBLANK (Products[Product Name]),”Unknown Product “&Sales[Product],Products[Product Name])
    )
    ORDER BY Sales[Product]

    Click on Connections in the Data Tab and highlight LinkedTable_Products. Click on the link in the lower pane, Highlight the entry that contains Products2 and then click on the Properties… button.
    Check the Refresh Data when opening the file box to ensure that the new Product2 table is always up-to-date.

    Load Product2 back to the model using the PowerPivot linked table button. Now update the model to link the Sales Table to the new Product2 table and hide the original Product table.

    Note: there is one redundant column, the original Product Name, in the above. This can easily be hidden in the model.

    As an alternative you can use the following DAX expression … In which case there is no redundant Product Name column.

    SUMMARIZE (
    Sales,
    Sales[Product],
    “New Product Name”, IF (ISBLANK (LOOKUPVALUE(Products[Product Name],Products[Product],Sales[Product])),”Unknown Product “&Sales[Product], LOOKUPVALUE(Products[Product Name],Products[Product],Sales[Product]))
    )
    ORDER BY Sales[Product]

    The irony here is that the code for the “New Product Name” resembles VLookup!!

  6. Yes. I had a three-step VLOOKUP that categorized work orders according to three different characteristics of the transaction. If the Task was unique, it used the task; if the Task was “Other”, then it would use the Specialty (a higher level of organization); and if that wasn’t specific enough, it would use the first part of the description. Although I replicated the process in PowerPivot, it was just so much easier in Excel.

    1. I agree that in some cases Power Query can solve the problem easily and it will be more efficient than adding a calculated column to the fact table.
      Some more elaborate cases like Fred mentions are probably also doable with Power Query but will require writing M which I consider beyond the scope of most Power Pivot users.
      I also will not recommend the solution using the DAX query to create a query table in Excel feeding from the model. I’ve used this technique in the past and there are cases in which it is useful but again using DAX as a query language is not a common skill.
      It is ambitious enough to expect Excel users to learn to use DAX as an expression language and not add requirements for knowing DAX as a query language or M script.

    1. I was using variations of the original post to fix this problem but I have switched to SWITCH (thank you mcleanrobin). However I was using a calculated column in my huge sales data table to tell me if the store was a remodeled store (from my remodeled store list table) or a “residual” store (stores that are not in my remodeled stores list). However to get the calculated column out of 337M sales data table and into my All Stores table I went with the below variation which uses a disconnected table approach (since my All Stores table is not related to my remodeled store list table).

      SWITCH(
      TRUE(),
      LEN(LOOKUPVALUE(result_columnName, search_columnName,search value column) )>0, “Remodel”, “Residual”
      )

      This way my “Remodel or Residual” test is done on the All Stores table of 1000 something rows instead of the data table of 337M rows and I can run my filter on my All Stores table instead of my data table.

Leave a Comment or Question