skip to Main Content

A quick update, as tonight I managed to sit down and experiment a bit with the V2 Beta for the first time in many days.

(OK, actually, I am standing up…  because I have a fantastic new treadmill desk and have walked nearly 8 miles today while working, but that’s a topic for another post).

I was working with one of our HostedPowerPivot clients today who was observing slower-than-expected slicer click performance in one of their reports.  And after some sleuthing, I found that they had a “distinct count” measure in their model, something like this:

[Measure] = COUNTROWS(DISTINCT(Table[Column]))

Now, that’s a pretty useful measure in many situations, like “does product X sell every day?”  Simply taking a count of the date column doesn’t cut it, because if it sells twice on one day and not at all the next day, the count is still 2, and you want it to be 1.

Trouble is, in PowerPivot v1, that operation can be slow.  I won’t bore you with all the details, because frankly, I don’t know all of them, heh heh.  But I do know that PowerPivot v1 does end up creating all kinds of temporary new tables in memory behind the scenes when it is evaluating the measure.

Anyway, I’d heard a long time ago that PowerPivot V2 was going to make distinct counts a lot faster, so I thought I’d try it out.

The Test Case

First I needed to create a test case that was challenging.  I don’t want to compare “fast query vs. fast query” because that often gets skewed by fixed overhead.  Much better to test a “worst case scenario.”

So, I took a 15 million row table, and added a calculated column to it.  Calculated columns in PowerPivot are much less compressed than imported columns, and are therefore more challenging for the measure engine to scan through, looking for duplicates etc. which is required for a distinct count.

And then, to make matters worse, I just used the RAND() function so that there are many unique values:

image

Cutting to the Chase:  The Results

A distinct count measure in PowerPivot v1 took 35 seconds to complete in my test pivot.  This was a big part of why our client was seeing slow perf today.

That same measure, in that same pivot, took less than a second in the V2 beta.

Wow.  This is gonna be nice.

One Last Note:  Two Ways to Write the Formula

PowerPivot v1 did not have a dedicated function for distinct count, you had to do the countrows of distinct thing like what I showed above.

But in PowerPivot v2, there IS a DISTINCTCOUNT() function.  So you can write your distinct count measures in two ways:

[Measure] = COUNTROWS(DISTINCT(Table[Column]))

[Measure] = DISTINCTCOUNT(Table[Column])

That new function is nice, makes for a more readable formula.  BUT…  the performance is still the same.  Either way I wrote it, the measure was sub-second fast.  Very nice.

Rob Collie

One of the original engineering leaders behind Power BI and Power Pivot during his 14-year career at Microsoft, Rob Collie founded a consulting company in 2013 that is 100% devoted to “the new way forward” made possible by Power BI and its related technologies. Since 2013, PowerPivotPro has rapidly grown to become the leading firm in the industry, pioneering an agile, results-first methodology never before seen in the Business Intelligence space. A sought-after public speaker and author of the #1-selling Power BI book, Rob and his team would like to help you revolutionize your business and your career.

This Post Has 6 Comments
  1. Rob, it’s great to hear of those performance improvements with distinct count.
    On a related topic, have you experimented with using DISTINCTCOUNT() on a text column? The documentation seems to state pretty clearly that it only works on numbers or dates. http://technet.microsoft.com/en-us/library/gg492142%28v=SQL.110%29.aspx
    Because of that I filed a suggestion on Connect to create a DISTINCTCOUNTA() function that would work on text values. https://connect.microsoft.com/SQLServer/feedback/details/679985/create-a-dax-distinctcounta-function
    But subsequent experimentation showed that DISTINCTCOUNT() does work on a text column. What has you experience been with DISTINCTCOUNT() on text values?

    1. I have not yet tried DISTINCTCOUNT() on anything other than the example above. But given that COUNTROWS(DISTINCT()) has always worked on text fields, and proves to have the same improved performance as DISTINTCOUNT, I too am surprised that the docs say it doesn’t work.

  2. Okay, that’s awesome about the distinct counts being faster–but what on earth is a treadmill desk?? I thought I was being proactive by sitting on a bouncy ball.

  3. Its been sometime I am trying to get my head around this. I have imported information on close to 1.5 million customer orders in a powerpivot table. Customer orders generate different delivery packets. Sometimes for the same customer orders there are several packets generated. So the data looks like:
    Order,Package
    e1,p1
    e1,p1
    e1,p2
    e1,p2
    e1,p3
    e2,p5
    e2,p6 etc..

    What I want to do is calculate in adjacent field a count of all unique packages that the customer order has generated. In the above example that would be:
    Order,Package
    e1,p1,3
    e1,p1,3
    e1,p2,3
    e1,p2,3
    e1,p3,3
    e2,p5,x …

    I just learnt about powerpivot and am not able to understand how this could be achieved. I need this field for calculation purpose. Many Thanks!

Leave a Comment or Question