Post by Rob Collie

Detailed Pivot Report Using Flattened Pivot

Does Your Pivot Look Like This?  Does its Slow make you Sad?  Time for a Fix!

Tell me if this sounds familiar…

Yes, you know that pivots are meant to show aggregations.  Summaries.  Pivots were NOT invented to display thousands of rows of detail data.

But still, sometimes you need to do precisely that. The biz needs its list of customers and how much they’ve been buying, for instance, and all that data is in YOUR Power Pivot model.

And hey, pivots are really the only game in town* for table-shaped display of data.  So, you build one of the monstrosities like the above.

(*OK yeah, you DO know about this thing called DAX Query Tables, but those are seriously a pain to set up.  So, no.  You rule those out before even starting.  Just like me!

So You Do The Flattened Pivot Dance, Right???

In pictorial form…

Detailed Pivot Report Using Flattened Pivot

The Flattened Pivot Option – Found in the PivotTable Dropdown in the Power Pivot Window

Detailed Pivot Report Using Flattened Pivot

Next, You Pile a Whole Bunch of Fields Onto Rows

Turn Off Subtotals In Your Flattened Pivot

Then, on the PivotTable Design Ribbon Tab, You Turn Off All Subtotals

And Voila!  It’s Slow as Heck.

Depending on the number of fields on rows, the number of tables involved, and the row counts and amount of uniqueness in those tables and fields, your flat pivot can range anywhere from “a little slower than average” all the way up through “this pivot will not finish calculating before our Sun goes Red Giant on us.”  (Actually I think that certain pivots might literally meet that latter definition.  More on this later.)

In my example, I have small row counts (every table has less than 100k rows), so even though I have 12 Row fields from 3 different tables, my pivot is simply annoying-slow, not galactic-slow.

Our Detailed Flattened Pivot is SLOW.

This Slicer Click Took 11 Seconds.
(But In Larger Data Sets, a Pivot Like This Can Take Hours.)

Why Is it Slow?  The Short Version, Please?  Please?

Here, let’s zoom in on the upper right corner of the pivot:

image

The Five Row Fields Pictured Come From Three Different Tables

Hey, that first customer, let’s call them customer XYZ, lives in Australia, has an Australian-style phone number, and has Cookie Monster for a Sales Rep.  All well and good.

But Power Pivot doesn’t “know” that. 

Behind the scenes though, the Power Pivot engine tried, um, a LOT of other combinations. For good measure (ho!  good measure!  get it??), it calculated a combination of customer XYZ, Germany for a country, North America for Continent, and Abby for a Sales Rep.  Oh, and with a British phone number belonging to customer QRS to boot.

In fact, Power Pivot calculated EVERY SINGLE COMBINATION*.  It just turns out that, hey, most of those combinations result in no data – there just isn’t any sales data for a country named Germany in a Continent named North America.  So the engine sends back a BLANK() result, and the pivot doesn’t display blank results by default.  You don’t SEE it, but trust me, it crunched it.  While you waited.

This can literally result in the engine having to perform MILLIONS of “silly” calculations for every “real” calculation.  And millions is when you’re lucky.  It can get into Billions pretty quickly.

(*OK, I am not sure if Power Pivot tries nonsensical combinations within the same table.  So between all the fields in the Customers table, it might be smarter and not do all the silly extra work.  But across tables, definitely – it tries every permutation.)

An Aside…  NECJ Unmasked!

imageBack in 2003 I was basically a high-functioning child, and I was in Denmark on Serious Business.  In my job at Microsoft, I had been placed in charge of the BI features in the next release of Excel, aka Excel 2007. 

So naturally, I had to go learn about BI.  A guy named Allan Folting was my “tour guide” for this trip, because he  a) was from Denmark  b) knew a hell of a lot more about BI than I did  and c) he reported to me.  There were Bona Fide BI Customers to be found in Denmark, and Allan knew where they were.

On one of those visits, Allan and the customer started talking about Non-Empty CrossJoins.  I had no idea what that meant, but I could tell by the way they pronounced it that it was Not Good.  The customer was unhappy.

Since this sounded Bad, I asked Allan later to explain. He used a bunch of words.  I made a half-hearted attempt to understand and then gave up.  I was dedicated to my craft back then. 

Anyway, I now know, in hindsight, that THIS topic – detailed pivot reports where the engine “boils the ocean” to find a single grain of sand – is the NECJ issue they were discussing.  The rows that the pivot actually displays?  You know, the meaningful ones?  Those are the Non Empty rows.  The multiple tables?  That’s the CrossJoin part.  We even have the CROSSJOIN function in DAX, which I used for Pokemon analysis last year, and then for real business purposes just this past week actually.

This is a trend in my life actually – Older Rob going “aha, THAT’s what they meant when Younger Rob was spacing out!”  I think I just took a lot longer for me to mature, honestly.  I even lost my baby teeth a lot later than average.  I hope this means I live to be 120.  And yes, Excel 2007 turned out great.  I had a great team, and I actually was pretty good at my job.  I’d just be better today.

OK.  Here’s a Fix That May Help You.

Don’t put a bajillion fields on Rows.  Put ONE field on Rows, and then use Measures for the rest.

A Fix for Slow Detail Pivots

Interesting Eh?

And the pivot still looks kinda the same…

A Fix for Slow Detail Pivots

This Pivot is Conveying the Same Information as the First One…

A Fix for Slow Detail Pivots

…But THIS Slicer Click Took 3 Seconds Instead of 11.
(And in more complex cases, it would be MUCH more than 4x faster.)

The Approach and Formulas

First, determine what each row of your report is supposed to show.

If each row is a single Customer, for instance, you MUST put a unique identifier for Customer on Rows.  In this case I used the CustomerID column.  (And if each row is a Product, well, you need a unique id for Product).

Then, for fields from the same table as the ID (Customers table in this case), the measures are simple-ish:

  First:=FIRSTNONBLANK(Customers[FirstName],1)

  Last:=FIRSTNONBLANK(Customers[LastName],1)

Etc. etc. for all the other fields in the Customers table we care about, like Phone and the like.

And remember, those only work if you have a unique identifier on Rows of the pivot!

Fields from Other Tables

That simple FIRSTNONBLANK formula isn’t going to work for the other tables, like Territories.

Since the Customers table has no relationship to Territories, the unique customer id column has no filtering power over Territories.  So the FIRSTNONBLANK will operate on an unfiltered Territories table, and just give you the alphabetical first value from the entire table.  That’s not what we want.

So here’s what I’ve done for Country and Continent, from the Territories table:

  Cntry:=

  CALCULATE(FIRSTNONBLANK(Territories[Country],1),
           
FILTER(Territories, [Total Purchases]>0))

  Cont:=

  CALCULATE(FIRSTNONBLANK(Territories[Continent],1),
            FILTER(Territories, [Total Purchases]>0))

The FILTER strips the Territories table down to rows that have data in the Sales table (as represented by our [Total Purchases] measure).  And since we only have one Customer in our filter context for each row of the pivot, that means we only get Territories rows where that Customer was active.  In this case, that means we get their home country and continent.

Added Benefit:  Sorting Works!

Let’s go back to our first pivot, the one with all the fields on Rows.

And now let’s sort the pivot Largest to Smallest by the Total Purchases amount…

Sorting Doesn't Work Well in Flattened Pivots

Let’s Sort This Sucker!

And the results…

Sorting Doesn't Work Well in Flattened Pivots

The Sort Operation Completed.  But NOTHING Changed.

What the heck happened?  Well, the pivot DID sort the Sales Rep field largest to smallest.  See the little arrow on the dropdown button next to “Sales Rep” in the picture above?  But since each Sales Rep value is being sorted “within” all of the other fields, each single value of Sales Rep is being sorted against itself.  Which moves nothing.

You CAN work around that, by going to the dropdown button on the ID column of the pivot instead, choosing More Sort Options, and going from there to sort ID descending by the measure.  That way you get what you want.

But what if you want to sort by First Purchase? You are flat out of luck.  Seriously, you’re stuck if you’re using Row-based flat pivots and want to sort by any of the “property” columns other than the ID itself.

In the measure-oriented pivot though? You’re in BIZNESS…

Sorting Works Much Better in Measure-Oriented Detail Pivots

Absolutely, You Can Sort by First Purchase in the Measure-Oriented Pivot!

This results in the most recent first-purchasers floating to the top…

Sorting Works Much Better in Measure-Oriented Detail Pivots

Results.  We Like.

Yeah, OK, I’m cheating a bit here.  This post is called Text Measures, and I just sorted by a Date measure.  Excel pivots can’t sort accurately based on true text measures for some reason.  The option is there, but it gives me weird results, at least in 2010:

image

I Just Sorted by the Occupation Measure.  But it’s not doing the right thing.

Oh well, can’t win ‘em all.

  Subscribe to PowerPivotPro!
X

Subscribe

Rob Collie

One of the founding engineers behind Power Pivot during his 14-year career at Microsoft, and creator of the world’s first cloud Power Pivot service, Rob is one of the foremost authorities on self-service business intelligence and next-generation spreadsheet technology. 

This Post Has 17 Comments

    1. Agreed, if we’re TRULY performing detail reporting.

      But not if we’re performing aggregations and filtering though, which are the strengths of the DAX engine. The pivots in this example are not TRULY detail-level, since they DO aggregate [Total Purchases] over time, Product line, etc., AND allow interactive slicing. PQ will never be a good solution for that stuff, because it isn’t meant to be.

      1. Just unwrapped two autographed copies of Jelen’s “XL” while reading your reply. I must have been clipped by a bus while cycling home… and went straight to spreadsheet heaven. Many, many thanks for this example Rob, can’t wait to take a crack at it in the morning.

  1. Hi Rob, the real reason why this type of PivotTable is slow isn’t the crossjoin, it’s that Excel returns some subtotals even when you have turned them off. This post, and the posts by Rui Quintino and Richard Lees that it links to, has more detail: http://blog.crossjoin.co.uk/2013/05/29/aggregating-the-result-of-an-mdx-calculation-using-scoped-assignments/

    Another solution to this problem is to use custom MDX inside named sets, as described here: http://blog.crossjoin.co.uk/2012/10/21/introduction-to-mdx-for-powerpivot-users-part-3-the-members-and-crossjoin-functions/

    1. If you use Power Query to retrieve data from your cube the default output format will be a flattened table (see this step-by-step-guide on how to do it: http://blog.crossjoin.co.uk/2015/01/13/a-closer-look-at-power-queryssas-integration/ ). You can then reorder your columns as you like or do pivots on them if you need subtotals (as the pivot on this data (table) doesn’t suffer the problems like the above mentioned pivots on cubes). Actually no additional “formatting measures” needed as in this Power Pivot example.

      If you’re after some advanced output & formatting options on data coming from Analysis Services, have a look at my latest blogpost: http://www.thebiccountant.com/2015/08/12/report-design-with-power-query-1-cascading-time-granularities/

  2. Great post Rob. I didn’t realize that flattened pivots could have such bad performance problems.

    The only thing that I don’t like about FIRSTNONBLANK is that you don’t know if there are actually more values when you are only expecting to get one.

    I like to do the following:

    1. Put the unique column on the rows like you did

    2. Use this measure:
    IF(ISFILTERED(Territories[ID]),VALUES(Territories[ID]),BLANK())

    3. Then the pivot table will break if we unexpectedly get more than one value which means the unique column isn’t actually unique

  3. Hi, I really appreciate all the help you’re giving us. I have a little problem using the FIRSTNONBLANK function. Suppose I want to write a measure that display a Customer number and sometimes, this number can start with some zeros.

    For example : “0000142”

    I managed to retrieve the number but the FIRSTNONBLANK function seems to remove all the zeros and format to a number. I also tried changing the Field format setting it to text but it didn’t work.

    Can you help me on this one or refer me to another post :).

    Thank you!

Leave a Comment or Question