PowerPivotPro is Coming to Phoenix

February 20 - 22, 2018

Registration for 2018 Public Training is now open!


**Use the discount code “3ORMORE” when signing up 3 or more people.

FEBRUARY 20 - 21

Foundations: Power Pivot & Power BI

Instructor: Kellan Danielson

Super charge your analytics and reporting skills with Microsoft’s dynamic duo. Designed to handle huge volumes of data, these tools will transform the way you work! Two Days in our class and you are EMPOWERED!

  • Learn Microsoft’s secret weapon behind Power Pivot & Power BI: DAX
  • Taught by Kellan Danielson – PowerPivotPro Partner and Vice President of Client Services
  • You don’t need to be an IT professional – most of our students come from an Excel background

FEBRUARY 20 - 21

Level Up Series: Advanced DAX

Instructor: Ryan Sullivan

The Advanced DAX Course was such a hit in the first half of 2017 that we’ve expanded the course to 2 days!


  • This advanced DAX training class is taught completely in Power BI Desktop.
  • Students are encouraged to take our Foundations course and have hands on experience with the DAX language.
  • Taught by Ryan Sullivan – Principal Consultant.
  • Class material drawn from usage of Advanced DAX applications while consulting with hundreds of international firms.


Level Up Series: Power Query for Excel & Power BI

Instructor: Krissy Dyess

The second class in the series is our Level Up Series is Power Query for Excel & Power BI.

  • Students are encouraged to take our Foundations course and have hands on experience with Power Query in Excel or Power BI Desktop.
  • Taught by Krissy Dyess – PowerPivotPro Principal Consultant and Phoenix native!
  • We will cover common to unique business challenges made easy with Power Query’s data wrangling capabilities.
  • Intermediate to Advanced Level Power Query best practices distilled into easy to understand patterns to apply to your most common business challenges.
PowerPivotPro Logo


Johnny take a tour of your data, and tune…
…up your columns and your rows, to free up some room
You’ve been following some rules, with the former Vertipaq
Now you’re seeing some results, that you don’t understand
(Click image for the original music and lyrics)

Modifying Lyrics From U2’s Best Album Ever

Yes, “Achtung Baby” is U2’s best album ever.  Yes, I am deliberately trying to pick a fight.

(“Former” VertiPaq?  Why yes, Microsoft recently renamed the storage engine under the hood of PowerPivot to be xVelocity.  Yes, VertiPaq was a better name.  Yes, there were marketing reasons for the rename, to create an “umbrella brand” over many technologies.  No, this does not help any of us out here in the real world.  I wonder how long I will keep calling it VertiPaq.  Probably a long time.)

An Exception to One “Rule,” and the Reinforcement of Another

Longtime readers of this blog have seen me tout the benefits of reducing columns in your data, even if that means adding a lot more rows (see also David cover it here).

Well, I recently discovered an exception to that “rule.”  And that same data set also yielded a VERY dramatic example of the benefits of sorting your data prior to import.

Let’s cover one at a time.  But first, some perspective.

Disclaimer:  This Post is Probably Just Brain Candy

The data sets I am working with are in the tens of millions of rows and therefore don’t reflect “normal” usage for most people reading this (although I’d argue that is changing, to an extent).  What I’m seeing in this data is very unlikely to repeat itself in smaller data sets.

That said, the impact of reducing columns (even at the expense of more rows) is not something you will typically notice with small data sets.  If you only have, say, 20 thousand rows of data in a table, I would not worry too much about reducing column sets – you still shouldn’t import columns that you don’t need, but I wouldn’t go to great lengths to reshape your data set.

So file this post under “interesting brain candy that may be relevant someday” rather than “something you intrinsically need to know.”  This is 90% fun, with a slice of 10% relevance.  OK, time to dig in.

Original Data Set:  Two Wide Tables

I had a two wide tables in my data.  Yes they look VERY similar but trust me, they are different tables:



Total File Size:  561 MB

OK, Now Make Them Tall and Narrow!

I was naturally tempted to try importing these same tables as tall and narrow, by introducing an amount type column.



Total File Size: 1,207 MB

Whoa, it Doubled the File Size!

Yes, a normally-reliable trick has betrayed us here.  Why?

Well, I have not confirmed this with my former colleagues at Microsoft, but I suspect it has something to do with VertiPaq storing data in 1M-row partitions.

That’s right – if you have more than one million rows in a table, they are not all compressed and stored in a single unit.  Your data is first broken into partitions of one million rows each, and then each partition is compressed and stored as an individual partition (“chunk” in Rob-speak).

“Chunking” does reduce the effective compression rate.  In some cases we would get MUCH better overall compression if all the data were stored in a single chunk, but that comes with other tradeoffs that I don’t have room to go into here.

Why Does the One Million Row “Chunk Size” Matter?

Well first of all, in the original “wide” data set, I had 45M and 44M rows.  So, a total of 89 partitions/chunks.

In the new data set, I have 366 chunks.  So each chunk would need to be 4x smaller than the original chunk size just to result in a similar file size.

Given that the file size doubled, we can guess that each new chunk was half the size of the original chunks.

Impact of Sort Order

Let’s take that knowledge of 1M row chunks and consider the impact of sort order as you import data.

My database is sorted by the leftmost columns in the table – Date, PeriodType, Product, etc.

That means my new AmountType column is basically unsorted – look at the pictures above and you will see that even in the tiny number of rows I snapshotted, AmountType is varying whereas all of the columns to the left display the same values over and over.

That means my AmountType column is very “fragmented” in each of the 1M row chunks, and I am not getting as much savings from the shape change as I would if it were not fragmented.

So, let’s try sorting the data by AmountType before importing it!

Sorting by Amount Type Decreases File Size by 35%!


Note that This Was Just Table #1
(I Deleted Table #2 Before Running This Sorting Test)

Marco and Alberto were the first people to bring sort order to my attention.  In their experiment, they found a 25% savings on sorted versus unsorted data, but not much difference when they sorted by different columns.

In my case, the data was already sorted.  Changing which column I sorted by, BEFORE import, made a 35% difference.  So, your mileage may clearly vary.

Only Sort BEFORE Import Matters!

Just want to be overly clear here:  once you import the data, sorting in the PowerPivot window has NO impact on file size.  Actually, it has no impact whatsoever, except that it helps you navigate your data in the PowerPivot window.

Only sorting the data in the source, before importing into PowerPivot, has an impact.

In This Case, I Went Back to Wide Tables

All right, even with the 35% reduction from sorting, my resulting files would still be bigger than the original “wide” format.

I also ran some quick performance tests and both formats were returning my test pivot in about 4 seconds.

So in this one case, going tall/narrow didn’t help.  But that won’t discourage me from trying it again next time.  I am pretty sure this data set is an outlier – it currently only has two values in the date column for instance.

The moral of the story:  the tall/narrow trick is generally reliable.  But always test to make sure.

  Subscribe to PowerPivotPro!


Rob Collie

One of the founding engineers behind Power Pivot during his 14-year career at Microsoft, and creator of the world’s first cloud Power Pivot service, Rob is one of the foremost authorities on self-service business intelligence and next-generation spreadsheet technology. 

This Post Has 5 Comments

  1. Hi Rob,

    One thing I noticed is that in the tall/narrow tables the ‘amount’ field is a decimal. On the wide tables there didn’t seem to be decimals (though it could be just the sorting on the small sample represented on the screenshot)

    Decimal values would be harder to compress due to the high cardinality; perhaps this could explain some of the growth in memory

    1. Hi Javier. That’s actually just because of the screenshots. The wide tables have decimal values in them too, just not in the visible range where I took the screenshots.

  2. Ah, this is *so* interesting for me (I must be in the 10%, with a long-and-narrow data table.
    Do you know if there is any way to increase the “chunk” size of 1M rows?
    In my case my high-cardinality columns have about 6M values each (and there is little I can change about that, taking into account that I need to distinctcount it so I cannot split it up), and I’m looking for ways to optimize the report interactivity…
    P.S.: would changing those columns from strings to integers help?

Leave a Comment or Question