skip to Main Content

badger-44210_640

I remember feeling that way, Mr. Badger, but they’re simple… if you can SEE them!

Have you ever needed to SUMX() over the values in more than one column? SUMMARIZE() can do it!

I tried solving this problem and the internet led me to SUMMARIZE(). It was stuffed inside another function and it confused me. I felt a lot like my friend in the above picture… discouraged. DAX had been so fluid until this point! There were learning curves and lots of new ideas, but I had never bumped into a function and heard my brain just say, “Nope.”

SUMMARIZE() was confusing me because I couldn’t see what was going on.

The problem was that with all of the others I had been exposed to, the functions were “follow-able.” Either the function intuitively followed its name or I was able to go into my data model and filter some things and SEE what was really happening (by the way, Power BI, get on that). With SUMMARIZE(), there were things happening and I couldn’t see under the hood.

Enter, Power BI Calculated Tables – Seeing SUMMARIZE() made easy!

Before Power BI calculated tables, the only way I knew of to see the results of a DAX table function was DAX Studio (made by our friends over at SQLBI). With this calculated table functionality in Power BI, seeing SUMMARIZE() is as easy as writing it. I’ve built a simple AdventureWorks data model with a Calendar, list of Products, and Sales.

DataModel

Let’s start with a simple example: Summarizing one table…

We can go to Modeling>New Table in Power BI and try out SUMMARIZE() and see how it transforms a table:

Summarize_Test = SUMMARIZE(‘Calendar’,’Calendar'[CalendarYear],’Calendar'[CalendarQuarter])

summarize

SUMMARIZE() works by taking the table we gave it in the first argument (Calendar) and grouping the rows by the distinct combinations found in the columns we gave it in the second and third arguments (Calendar[CalendarYear] & Calendar[CalendarQuarter]). I like to think of it as VALUES()-Plus.”

VALUES() takes 1 column & gives the unique values, SUMMARIZE() does the same for 2 or more.

To bring this full circle, let’s stuff our SUMMARIZE() into a formula and show how much SEEING the result helps us to understand what’s going on… Let’s say we want to know our average quarterly sales…

SalesTotal = SUM(Sales[SalesAmount])

AvgQtrlySales = AVERAGEX(SUMMARIZE(‘Calendar’,’Calendar'[CalendarYear],’Calendar'[CalendarQuarter]),[SalesTotal])

In the first measure, I simply write a measure to total up our sales dollars. In the second, we iterate over the combinations of ‘Calendar'[CalendarYear] and ‘Calendar'[CalendarQuarter] and treat each of them as if they were coordinates on a pivot table (filter context). This produces a SalesTotal for each Year/Quarter combo that we then take the average of.

Where does ADDCOLUMNS() fit into all of this?

Let’s say my explanation in the previous paragraph left you feeling a bit lackluster and you want to SEE the [SalesTotal] for each combination of Year and Quarter in a table. That’s where ADDCOLUMNS() comes in! Let’s take our Summarize_Test table from above, throw in an ADDCOLUMNS() and take a peek:

AddCol_Test = addcolumns-1

ADDCOLUMNS(

     SUMMARIZE(

          ‘Calendar’,

          ‘Calendar'[CalendarYear],

          ‘Calendar'[CalendarQuarter]

     ),

     ”Sales”,[SalesTotal]

)

ADDCOLUMNS() works very much the same as my explanation of [AvgQtrlySales] above, without any averaging; it took each combination of Year and Quarter as if they were coordinates on a pivot table (filter context) and it calculated [SalesTotal]. The first argument is the name of a table or the result of a table function (like SUMMARIZE()), the second is the name of the column we are about to add, and the third is the rule for creating it.

Tying it all together

A nagging question you may have had… “I can get these answers using a simple pivot table, why learn this?” The answer here is simple: Sure, we can get the values that ended up in our tables, but what if I want to use them dynamically in other calculations? Even if I use DAX to generate these tables, they will only refresh if I change the formula or refresh the data model. This provides a path for dynamic table calculation!

If we’re trying to hone in on what SUMMARIZE() and ADDCOLUMS() really do, SUMMARIZE() is the grouping guru and ADDCOLUMNS() is best at adding columns to DAX tables! The power that both of these functions provide far surpasses what we’ve talked about in our intro examples. Just like the first time I used CALCULATE(), we can use these patterns, without fully understanding them, to make it rain money for our companies. Play around with them and you’ll find all sorts of nuanced behavior, additional capabilities, and performance hacks.

#Badger

Come for the Techniques, Stay for the Business Value!

We get it:  you probably arrived here via Google, and now that you’ve got what you needed, you’re leaving. And we’re TOTALLY cool with that – we love what we do more than enough to keep writing free content.  And besides, what’s good for the community (and the soul) is good for the brand.

But before you leave, we DO want you to know that instead of struggling your way through a project on your own, you can hire us to accelerate you into the future. Like, 88 mph in a flux-capacitor-equipped DeLorean – THAT kind of acceleration. C’mon McFly. Where you’re going, you won’t need roads.

This Post Has 21 Comments
  1. Awesome post Ryan, thanks for pushing the technical limits and exposing these functions to the readers.

    I think it’s worth mentioning that the additional columns that the ADDCOLUMNS() function creates depends on context transition to generate the proper result. Without going into too much detail, here’s the 2 takeaways:

    1) If the “rule” you want to use is already defined in a measure, just use it (as was shown in the article above). That’s because there’s an implicit CALCULATE() being called whenever you refer to a measure. This causes context transition to happen as expected.

    AddCol_Test_using_Measure =
    ADDCOLUMNS (
    SUMMARIZE ( ‘Calendar’, ‘Calendar'[CalendarYear], ‘Calendar'[CalendarQuarter] ),
    “Sales”, [SalesTotal]
    )

    2) If the “rule” you want to us is NOT already defined in a measure, be sure to wrap the entire formula inside a CALCULATE(). This will force context transition, which is necessary to get the correct result
    ex. if you want to use MAX( TableName[ColumnName] )….use CALCULATE( MAX( TableName[ColumnName] ) ) instead
    Putting it all together:

    AddCol_Test_using_Formula_with_extra_Calculate =
    ADDCOLUMNS (
    SUMMARIZE ( ‘Calendar’, ‘Calendar'[CalendarYear], ‘Calendar'[CalendarQuarter] ),
    “Sales”, CALCULATE ( MAX ( TableName[ColumnName] ) )
    )

    1. Chris,

      This is a great point! Any time we are iterating (as ADDCOLUMNS() does), it is important to keep in mind what we are iterating over and where that is in relation to where the calculation is happening. Hopefully, having the ability to SEE what is happening in these DAX table functions allows us to explore the nuances like this one!

      Thanks for the comment!
      -R

  2. Copy more comments:

    1). SUMMARIZE has parameters to virtually extend table (and performs context transition so no ‘CALCULATE’ needed.).

    New Table =
    SUMMARIZE (
    ‘Calendar’,
    ‘Calendar'[CalendarYear],
    ‘Calendar'[CalendarQuarter],
    “Sales”, SUM ( TableName[ColumnName] )
    )

    Though per sqlbi your ‘ADDCOLUMNS’ method is considered a best practice. https://www.sqlbi.com/articles/all-the-secrets-of-summarize/
    but good to point out to readers if they look at the function reference and get confused with ‘ADDCOLUMNS’ syntax.

    2) You Included a Dimension/Lookup table in first parameter which works in your scenario; but it will return all combinations of Year/Quarter in table – even if fact table doesn’t have them. If you include a ‘Fact’ table as first parameter, you get a summary table of only those Year/Quarter combinations which actually exist in Fact table. Which is best? It depends…
    Also, by using Fact table as first parameter, you can include columns from other dimension tables like so:

    New Table =
    ADDCOLUMNS (
    SUMMARIZE (
    ‘FactTable’,
    ‘Calendar'[CalendarYear],
    ‘Calendar'[CalendarQuarter],
    Product[Color]
    ),
    “Sales”, SUM ( TableName[ColumnName] )
    )

    Great post though! Thanks for writing it up.

    1. We have the best readers!

      Another pair of great points. I was very conflicted about including these two, in particular. Decided to keep it intro level.

      You should apply for a job with us 🙂
      -R

      1. Apply for a job? Maybe someday. I live in Fairbanks, Alaska so not sure how that would work…heck, a lot of people think we are a foreign country. But I tell ya, I have sent 4 people in recent months to Rob’s classes so you guys do owe me. I should get a kickback or something. 🙂

  3. Hi Ryan,

    IF you’re a stickler for DAX correctness THEN

    SUMMARIZE(‘Calendar’, ‘Calendar'[CalendarYear],’Calendar'[CalendarQuarter])
    => Multi column DISTINCT()
    and
    SUMMARIZE(VALUES(‘Calendar’), ‘Calendar'[CalendarYear],’Calendar'[CalendarQuarter])
    => Multi column VALUES()

    END IF

    If you always use best practices this rarely has any practical importance.
    I think it’s a great idea to revisit some of the olden DAX functions with new details added in. I think your readers new or old appreciate this.

    PS: I like @Matt’s typo. If you write books on DAX aren’t you a kind of DAX poet?

    1. Hmmm. This is interesting as i have never thought to use VALUES on a table. I have only used on a column reference. The only difference between VALUES and DISTINCT that I am aware of is VALUES may include BLANK value, whereas DISTINCT does not. I’ll have to play around a bit with it.

  4. Just a quick comment to the DAX novices following this post. To find your way off the island of misfit functions, you will need to bookmark this post and return when ready. The “core of DAX” is primarily calculations based on tables. How many rows (of this table) equal this. What is the average of the column from that table. Even a single value that is one row wide and one row tall can be a table. There are advanced concepts such as expanded tables and context transition that can be understood using the understanding of the “core of DAX”. When you get to expanded tables, you will see that it is still a table, but the table is larger than you may have originally thought. And context transition can be thought of as a value returned (calculation) from every row of, you guessed it, a (table) – But don’t worry about understanding that just yet. Ryan let the “secret sauce” of DAX out of the bag – that is, to understand the results from calculations, you need to understand what table(s) that result is based upon. If you were asked to count the number of something, but you could not see, touch, or understand the “of what”, how could you possibly understand the result.

    You must be able to see tables, either in your mind or on your screen to move forward with DAX. Instead of “I see dead people”, it’s “I see tables”. If you don’t have Power BI Desktop, use the incredible tool called DAX Studio to see your tables. BTW, DAX Studio can be used for much more than just seeing your tables. If your company won’t allow you to use these tools, you can still use the wonky method here https://exceleratorbi.com.au/uses-dax-query-language that shows how you can see your tables in Excel without adding any other tools.

    Regardless of how you do it, you must see your tables to go to the next level with DAX. Now if I could only find a way to tell my ‘2014 self’ to go to 2017 to read Ryan’s post. Oh well, just think of me as your own ‘2020 self’, yelling at your ‘2017 self’ to following Ryan’s advice. Now go see and play with your tables.

  5. Hi, your article has invoked me to try the summarize and add columns.

    i have created a virtual table that has a new columns that calculates a Qty variation per line, for example (Q2-Q1)*P0 per line
    is there a way i can Sum the new columns so i can use the total variation per table selection?)

    to put it more simply, can i SUM a virtual column created by ADDCOLUMNS?

    thanks

    1. Hi Eitan, try this

      SUMX (
      ADDCOLUMNS ( ‘some table expression’, “Qty variation”, ( Q2 – Q1 ) * P0 ),
      [Qty variation]
      )

      you’ll have to insert your own table expression and the variation formula from your data model.

      Have a nice day.

Leave a Comment or Question