skip to Main Content
DAX Optimizations banner

Optimization has become one of my favorite topics. OK, I’ll admit it: it might be an obsession and a source of joy. My inability to optimize (queries, data models, and DAX) used to be a source of headaches and the occasional heart attack depending on what I was trying to accomplish. I make no claim that I’m an optimization expert, but I’ve spent hours wondering and researching why a data model’s calculation time could go from a couple of seconds to many minutes. This is a rare occurrence, but when it has happened, it’s happened when I’ve been on the verge of something great. Of course, if you’re taking business intelligence to its edge, you’re not adding 2+2—you’re trying to give someone an answer they (and you) thought was impossible. A recalculation that takes many minutes (especially if you’re not the end-user) is unacceptable and may cause your data model to crash—fatal exceptions anyone?

This article focuses on an optimization technique that I couldn’t find anywhere. Before I get into that technique, I’d like to touch on what I consider to be the basic tenets of optimization when working with data modeling and DAX.

Basic Tenets of Optimization

Cardinality, filtering, and VAR

There is some great information on each of these topics, so I won’t go into too much detail, but it’s essential that each of these is used in conjunction with each other. I’d also submit that building a solution that focuses on cardinality, filtering, and VAR in that order is helpful. The reason for this order is that cardinality applies to build an efficient data model; filtering is one often one of the first tools used in DAX authoring; and, VAR is a refinement of DAX. One could argue that cardinality also applies to DAX (e.g., iterator functions), or that filtering applies heavily to query methodology, which it does. Some may be able to pull VARs out of a hat, but I can’t—I often need to write a convoluted DAX formula to see the VAR potential and then rewrite it.

Cardinality

Put simply, the more unique choices in a column, the greater the cardinality. For example, if there is a “yes/no” attribute, the cardinality of that column is low. If there is a unique transaction number for each of 100 million rows, cardinality is high. The basic optimization principle I focus on when building a data model is that, if I don’t need a column—especially if creates high cardinality—I don’t even bring it into the data model. If I do need it, I should really need it (i.e., there is no other solution).

Filtering

I’m reminded almost every day of the importance of filtering. It sounds pedestrian, but it is far from it. Filtering is an art, and it takes time to develop an eye for it. It can be performed as part of your query or part of your DAX.

VAR

Marco Russo introduced me to variables at a conference session when they first came out. It took me a while for the application of variables to sink in, but when it did…wow. If you’re referencing the same scalar value, function, or table in a DAX formula, you might be able to use VAR. In a nutshell, whenever you write the name of a function or table multiple times, the DAX mechanism makes an equal number of “calls” to your data. VAR simplifies to potentially one call, which can have a huge impact.

DAX Optimization Mantra: Write It Like the DAX Calls It

As I eluded to in the intro, I ran into an issue where a data model went from a refresh of seconds to many minutes (like 20). There was a lot going on in this refresh: database connections, multiple fact tables, 10+ dimensions, and lots of hearty DAX. I’ll save the gory details, but the decrease in performance came down to ONE DAX formula. That formula was the result of a beautiful, iterated string of DAX, just like I had been taught to do. But, it made my solution very cranky and here’s why…

The ”call” this bottleneck formula made on the data set was more than met the eye. When a measure is created that is, for example, a measure plus a measure plus a measure, the data call is effectively re-running each of those measures individually. In a previous post, I presented an approach to creating an income statement model. There were some relatively straightforward DAX in that data model, like calculating net income actuals, which will be the measure I use as an example. Here it is, a harmless looking DAX measure for net income actuals:

NI Act :=
[GP Act] – [OI&E Act] – [D&A Act] – [II&E Act] – [Taxes Act]

That said, if you analyze the measure above in DAX Studio, this is the call the measure is making on the data set:

DAX Studio 1

To get this information, I connected DAX Studio to my data model, then selected the measure, right clicked on it and selected “Define and Expand Measure.”

image

Lots of CALCULATE and lots of one repetitive filter, Scenarios[ScenarioName]=”Actual.” An optimized formula would reduce, in this case, the number of CALCULATES and the number of repetitive filters. In general, you’ll want to reduce the number of anything repetitious. At first glance, one would think that VAR would be a good candidate, but with all the filter context changes, VAR doesn’t work. It’s been explained to me why, but unfortunately, my memory is that “it just doesn’t work.” Instead, I first opted to create another “base” measure that filters GL amount on actuals:

GL Amt (Correct Signs) Act :=
CALCULATE (
    SUMX ( Accounts, [SUM GL Amt] * Accounts[Report Sign] ),
    Scenarios[ScenarioName] = “Actual”
)

I then used this measure, to calculate my newly optimized net income actuals measure:

NI Act Optimized :=
CALCULATE ( [GL Amt (Correct Signs) Act], Headers[Header] = “Revenue” )
    – CALCULATE (
        [GL Amt (Correct Signs) Act],
        Headers[Header] = “Cost of Sales”
            || Headers[Header] = “Other Income & Expense”
            || Headers[Header] = “Depreciation & Amortization”
            || Headers[Header] = “Interest Income/Expense”
            || Headers[Header] = “Taxes”
    )

As you can see, instead of multiple formulas whose purpose is to adjust the filter context over and over, I’ve reduced it to one. When I rerun the formula in DAX Studio, this is the result:

DAX Studio 2
Optimized vs not

Not bad, I’d say! Most of the CALCULATES have gone away as have much of the repetitive filtering. The performance improvement in terms of noticeable time isn’t that much because I wanted to give an easily-understood example, but the percentage reduction in time is astounding, from 14ms to 2ms or an 86% reduction in processing speed.

VIDEO: Optimizing & Analyzing DAX Query Plans

FILTER vs. CALCULATETABLE: optimization using cardinality estimation

Using Variables in DAX

Where It’s At:  The Intersection of Biz, Human, and Tech*

We “give away” business-value-creating and escape-the-box-inspiring content like this article in part to show you that we’re not your average “tools” consulting firm. We’re sharp on the toolset for sure, but also on what makes businesses AND human beings “go.”

In three days’ time imagine what we can do for your bottom line. You should seriously consider finding out 🙂

* – unless, of course, you have two turntables and a microphone.  We hear a lot of things are also located there.

Matthew Mowrey

Matt Mowrey is the Senior Director of Finance & Analytics at a leading international development consulting services company. He manages FP&A, accounting, and business intelligence for the company’s largest business unit. Matt is particularly enthralled with DAX, M, and SSAS Tabular and uses them to create KPIs, monthly reporting packages, project forecasting tools, predictive financial modeling, U.S. government contractor indirect rate modeling (NICRA), and more. He enjoys traveling and playing board games with his family, and was a Peace Corps volunteer in Russia.

This Post Has 15 Comments
  1. I don’t’ understand how having data in the model that you are not using affects performance. If it is not used in a calculation how can it slow things down. I get that it takes up space (memory)

    The hardest thing i contend with is high caridnality. I have to have TRANSACTION ID. Distinct counts are killing me.
    The only solution I have (am testing) is splitting the FACT tables and linking them on primary key which enables me to countrows.
    But you take a hit on the split too.
    Initial testing is splitting is still faster

    1. Hello Fred. I certainly feel your pain–I’ve had transaction-level requirements too. You can try turning your transaction ID into data type number (if you haven’t already done so), that may help. Number data types put less strain on the compression engine than text. Something else you’ve probably thought of already is time scoping to reduce the number of rows, so that your solution matches the requirement exactly. And, a final suggestion is grouping techniques on some other attribute. For example, if counts of transactions are what you need, maybe you could live with counts by account? Just a couple suggestions. Good luck!

  2. This was extremely helpful. As I learn to build more and more complicated formulas, and my Power BI reports begin to slow down, I have always wondered about the “calculational overhead” effects of using them.

  3. Hello Matt, great article! I am pondering the first section: “Cardinality- Put simply, the more unique choices in a column, the greater the cardinality. For example, if there is a “yes/no” attribute, the cardinality of that column is low. If there is a unique transaction number for each of 100 million rows, cardinality is high.” and wondering if that applies to a data model that I have. Table A is related to Table B. In Table A, I have an added column, =RELATED which is delivering the value from the related row of Table B. Would you classify this as HIGH cardinality? What about =COUNTROWS(RELATEDTABLE)) also HIGH? I use these all the time and never really thought much about it…

    1. Hi John. Thanks for your comment and nice words! In general, I stay far away from RELATED–it *could* increase cardinality depending on what you’re relating, but the strain on the calculation engine is most concerning for me. When I feel the need to use RELATED, I usually go for a snowflake rather than star schema data model (so I have dimensions of dimensions). My article on income statement modeling touches on this. When you get into the section about header and subheader tables, these are examples of tables that would be typical RELATED candidates.

  4. Nice article Matt! Here’s a slightly cleaner DAX trick for you.

    IN can be used instead of many || statements:

    – CALCULATE (
    [GL Amt (Correct Signs) Act],
    Headers[Header] IN
    {“Cost of Sales”
    ,“Other Income & Expense”
    ,“Depreciation & Amortization”
    ,“Interest Income/Expense”
    ,“Taxes”}
    )

    just wrap the list of arguments in curly braces!

    SQL article here: https://www.sqlbi.com/articles/the-in-operator-in-dax/

    1. Hi Chris. Your comment was a game changer for me. I have a model that was completely OR’d out. I haven’t tested if there was a performance improvement, but it certainly made my formulas more readable. I’m training people up using this particular model too and the ORs can make your eyes cross. Thanks again! Much appreciated.

  5. I love the straightforward approach in this article! Thank you for a clean and concise case study in DAX optimization.

  6. Hi Matthew,
    Quick question – on the performance part, I assume you have done it clearing the cache before each execution, right? (I would do so…)
    I’m asking because of the following concern: caching. If I have a full income statement, showing every step, the single elements of [GP Act] – [OI&E Act] – [D&A Act] – [II&E Act] – [Taxes Act] would be calculated, and if I’m not mistaken, put in the cache. Afterwards, NI Act would directly use the cache, making the calculation pretty much instantaneous, instead of having to re-evaluate the formula.
    Am I right in thinking that in the scenario that I have described, using your approach would be less efficient? (in real world use, with caching – even if it is the first time the model is loaded) ?
    Thanks a lot
    Martin

  7. Can you elaborate on caching……if i open a report for ALL YEARS with high cardinatity distinct column, it will take 60 secoonds.
    Is the caching shared for the next user that uses it? Or is caching only for the session of the person?
    When does the cache go away? when a new power bi report is saved over the original? when deploying a change from visual studio to cube? when the cube is refreshed?

    If cache can be shared, is there a way to run a report with out user intervention so the when users get to work they can take advantage of it??

Leave a Comment or Question