What if I told you, there was one key skill, that has set me head and shoulders above the Excel crowd when working with Power Pivot and Power BI? Would you be interested in learning that skill? Great, because I was about to teach it to you anyway.

Pay It Forward

But first a story. I had discovered Power Pivot (code-named Gemini back then), while working in the Microsoft Commerce group. But really came in to my element at my next gig, the Microsoft Learning team (read about my Power Pivot Journey). There I met my colleague Gregory Weber (known as Greg). It’s hard not to be impressed by Greg. He was a Microsoft certified instructor and taught courses for many years, both within US and internationally. He has got the trainers voice, and fluidity with words. Along with the ability to deconstruct complex topics and make them easy to understand. He can talk about pretty much any topic under the sun, technical or otherwise. Once you get to know him better, you’ll find out that he enjoys building and programming robots in his spare time (How’s that for a hobby?). Lately he has been pursuing a degree at Georgia Tech, while also helping to teach some of the courses there. Besides all of this, he is a genuinely nice guy.

It was Greg, who suggested that we form a book reading club. Our first book was one I had heard about, but never read – The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling By Ralph Kimball. As a business analyst, I had leaned heavily on Excel, along with a mishmash of other technologies. Data warehouse and data modeling didn’t seem like topics that would be relevant to me; more for an IT/BI team perhaps. But I figured, it couldn’t hurt to learn something new.

Our book club meetings looked more as if, class was in session. We brought in our questions, and Greg patiently answered them, helping us realize the importance of the topics, and trade-offs involved in various choices. As things go, our reading club was disbanded before we were even halfway through the book. But the knowledge that I had gained, helped me grow by leaps and bounds in my Power Pivot and Power BI journey.

Now the truth is, I would never have picked up this book and read it myself (much less understand it). It was only due to a teacher like Greg, that I was able to internalize some of the concepts. So this article, is my humble attempt to pass on the knowledge (hopefully without mangling it). I’m just paying it forward.

Lost Chapter From Our New Book: Power Pivot and Power BI

The article below, was actually submitted, as a chapter in our new book. However under Rob’s keen eye, we realized that most of these topics were weaved in as best practices in the rest of the chapters anyway. While it didn’t make it in the book, we still feel it would be of value to our readers – hence this article.

If you were reading the book, this article would perhaps fit in best as Chapter 19A. (After Chapters 17/18/19 have covered Multiple Data Tables and Performance). You can get our book at Amazon or MrExcel.

Data Modeling – for Scalability and Usability
    Scaling to Millions of Rows
    Scaling to Multiple Data Sets
        Lookup Tables – the Who, What, Where, When, How
        Matrix of Lookup and Data Tables
        Crazy Lookups and Snowflake: Too Much of a Good Thing
    Usability

Data Modeling – for Scalability and Usability

Scaling to Millions of Rows

So you’ve read our advice that

• Wide Flat Tables = Poor Performance
• Narrow Data Tables with separate Lookup Tables (Star Schema) = Improved Performance

But you say to yourself, the performance of my model is just fine. Thank you, thank you very much. (You did say that the Elvis way, right?). That is the case for most new users adopting Power Pivot. They are working with relatively small models and no matter how they are structured, your performance is reasonably good. Especially, when compared to all the effort you had to put in earlier (before Power Pivot), to build similar reports.

However this honeymoon period is not going to last forever. Sooner or later you would end up working on a data model, large enough, that all these lessons would be important. It is easier to start off, following the best practices, than later on having to undo your work and unlearn the bad habits.

I would not elaborate too much on this here, we’ve covered this ground already in the Performance chapter(Ch 19). Instead I want to speak about other reasons, why a good Data Model is critical to your success.

Scaling to Multiple Data Sets

So you are a new Power Pivot user and start your journey with a small dataset. Say the Sales data, that we started with or something similar. At this point, for a small dataset, it doesn’t matter much, whether you keep separate Data and Lookup tables (Star-Schema) or a single flat-and-wide table. Since it is a small dataset, the performance would not vary noticeably and you can pretty much write any measure you need, in either design.


With a single small dataset, there is not much difference, performance wise,
between the two designs –Star Schema and Flat-and-Wide
(click to enlarge any image in this article)

It is always best to organize your lookup tables near the top and the data table near the bottom. Consider the above Star-Schema screenshot for illustration only.

Now think about, what happens when you bring in another dataset. If we are using flat design, we end up with two flat data tables – Sales and Service Calls in our case. That works as long as you analyze these two datasets separately.


Two Flat-and-Wide tables: Sales and Service Calls


Can create separate pivots to analyze the datasets separately…

However, often the magic happens when you can show measures from two different data sets side by side in the same pivot, and slice and dice them together. Or even better – write hybrid measures across the two datasets.


Analyzing disparate datasets together is the holy grail for analysts

But if you have two flat tables how would you ever be able to accomplish this? This is a big moment, so I want you to pause and think about it.

 

<<This page intentionally left blank to give you a moment of cogitation>>

 

The answer is obvious, and we have already implemented this design in our model, and exercises earlier. We need a common lookup table.


Common Lookup table would let us connect and analyze the datasets together

Lookup Tables – the Who, What, Where, When, How

And not just one, you need a few. Often when I am working with a dataset and need to think about the lookup tables I might need, I think of the – Who, What, Where, When, How. For example for our Sales data, I might answer that, as below and attempt to create the Lookup tables accordingly.


Who, What, Where, When, How – would become our Lookup Tables


Both data tables connected to all relevant lookup tables – ready for action!

Diagram View with Multiple Data/Lookup Tables: If you do end up with multiple data and lookup tables, your Diagram View might start to look like a spaghetti chart or a complicated Integrated Circuit diagram. We would still recommend that you keep your Lookup tables near the top and Data tables down below – to relate to our analogy of “relationships flow downhill”. Some people complain that they cannot see the related tables easily this way. However understanding your data and lookup tables and improving your discipline in using them that way should override that.

If you do end up with more tables than you can keep track of, consider creating Perspectives to keep them organized for yourself and for your users. Learn more at http://ppvt.pro/groupTables

Note that, you can have multiple “Who”s and multiple “When”s etc.

For example, for our sales transaction, we can have

Who: Who placed the order? Whose credit card was used to pay for the transaction? Who was it shipped to?
When: When was the order placed? When was it shipped? When did it arrive? When was the customer invoiced? When did the customer finally pay?

You can read about how to handle such cases, where potentially there could be multiple relationship between two tables, in our chapter on “Complicated” Relationships (Ch 22).

Matrix of Lookup and Data Tables

As your model evolves and you pull in additional datasets, these should get plugged into Lookup tables in a matrix like fashion (the rectangular array, not the movie; well maybe a bit like the movie Smile).


Data sets should connect to a common set of Lookup tables
(click to enlarge any image in this article)

Even if the number of data tables in your model grow at a prolific rate, the lookup tables should remain a fairly stable set. Your lookup tables typically represent the core entities in your business – like Customer, Product etc. – to which many data sets would hook up to. On occasion as you bring in new datasets (which typically represent business processes), you may need to add new lookup tables as well. But that exercise should be fairly infrequent.

Not all data tables would connect to all lookup tables, only the ones that apply to the specific dataset. That is perfectly all right, in fact to be expected. For instance in the example above, Budget does not connect to Customer or Employee, since Budget is not set at Customer or Employee level (at least for this example scenario).

When using measures across two (or more) datasets in the same pivot (or when using hybrid measures) just be aware that only the common lookup tables would provide filtering on both datasets. You may get invalid results if you use a lookup table which is only connected to one of the two data tables. See example below, but also covered in detail in Chapter 17 Multiple Data Tables, Section ‘Multiple Data Table Gotchas’.

 


Budget data table is not connected to Customer Lookup Table,
thus you would get invalid results if you try to use them together

Crazy Lookups and Snowflake: Too Much of a Good Thing

Whereas Lookup tables are good, there can be too much of a good thing. I have seen client models where everything is a lookup table and the diagram view looks like a mutant octopus.


Making every little thing into a lookup table is not good data modeling
(click to enlarge any image in this article)

Or where we have cascading Lookup tables (also called Snowflake).


Lookup Tables have Lookup Tables? That may not be ideal
(click to enlarge any image in this article)

None of these are considered good design for performance, scalability and usability reasons (discussed next). Your lookup table should represent something tangible and easily understood – e.g. Product, Purchase Order.

Consider all of this as guideline, a best practice. We’re not too squeamish about breaking these rules. Just be aware when you are not following the guidelines, understand why you are not doing so and account for possible implications.

Usability

Star schema (separate Data and Lookup tables) would typically offer you the best performance and allow you to scale to millions of rows and multiple data sets. But another important implication is usability. Even for the simplest scenario where we have only one dataset – which of the field lists below would you like to use, day-to-day to build new reports and analysis?


Star Schema would be much more user friendly once you start building reports

More importantly, which field list would you want other users to be using who connect to your data model? Star Schema groups items into logical entities which are easier to understand; not just by you but for all the users who would end up connecting to your data model and creating reports/analysis against it.

Think you are just creating a quick analysis model for yourself? Power Pivot is so powerful that you would often create models that last longer than you ever expect, be used by more people than you ever imagined. The model that I built at Microsoft, started as a simple model for my own use, but grew over time and went on to be used by hundreds of users and is still in operation long after I left Microsoft.

When we parachute into an organization for our typical 2-3 day consulting engagement, I still create each data model like it’s going to be the next best thing since sliced bread for them. Dare I say, it often is Smile

So dream big and aim high. Put some thought into designing your data model. It only takes a little more work and pays huge dividends in the long run.

To learn more Power Pivot and Power BI best practices and tips & tricks, get our bestselling book, Power Pivot and Power BI, at Amazon or MrExcel. A big thanks to Greg Weber at Microsoft for teaching me all I learnt about data modeling.

Power On!

  Subscribe to PowerPivotPro!
X

Subscribe

Avi Singh

Avi Singh has personally experienced the transformation and empowerment that Power BI can bring - going from an Excel user to building large scale Power BI solutions. His mission now is to share the knowledge about Power Pivot and Power BI. 

This Post Has 33 Comments

  1. with big data models the power pivot / power bi mapping area can get very crowded. I know there is the zoom in/out feature but its still messy…is there an easier way to handle this ?

    1. David, the best way to handle this is to create Perspectives. I have worked with Models approaching 100 tables. Creating different perspectives is the only way to keep yourself sane.

      1. Perspectives….I thought they are more targeted at the end consumer of the datamodel in their client application to make their life easier not for the developer in the powerpivot backend datamodel development area…?
        But i’ll check them out all the same.
        Thks David

  2. Nice article but the missing piece is the detail on how you actually create the look up tables. Still reading the new book and still reading some blogs but still hazy on the subject.

    1. Short version: Power Query can be used to create lookup tables, particularly with its Remove Duplicates function. And as you suspected, yes, this is covered in a chapter of the new book that you have not yet reached 🙂

  3. For our organization we can’t assume that all tables, data or lookup, are coming from a source where we can use SQL to design a better star (or snowflake). Power Pivot is valuable to us because we can pull together fragmented bits of operational data from legacy systems and link them with Power Pivot’s chewing gum and baling wire.

    As Rob notes, Power Query is vital to this effort. Unfortunately it’s hard to find the time (and brain cells) to get to be good in both Power Query and Power Pivot. At least in Power Pivot we’re always operating against an Excel paradigm, but for Power Query you’re cobbling together a unique ETL solution with every legacy data source you want to incorporate. I’d turn it over to the data warehouse gurus if they weren’t already busy with their responsibilities! I appreciated the much-improved chapter 22 in the new edition because I can’t avoid having to make filters run “uphill”. Without serious chops in Power Query however I have to traverse multiple uphill and downhill relationships to make connections, usually with a few calculated columns helping out in the intermediate tables. I’m hoping there’s a better way once I get as familiar with the tools as Avi and Rob, but I’m going to need a few more books from them!

    1. GMF, have you gotten to using Power BI Desktop? That has built in support for “bi-directional” relationships (downhill and uphill filtering). Which could be really handy in your cases. In my experience, cases where I needed that have been somewhat rare, so I haven’t been big on that.

  4. >>Think you are just creating a quick analysis model for yourself? Power Pivot is so powerful that you would often create models that last longer than you ever expect, be used by more people than you ever imagined.

    THIS. +1000.

    I use a certain model that was originally intended to solve one particular business problem, but that has Borg-ishly evolved into a much larger entity that consumes data from a variety of sources to answer a boatload of different business questions daily. In hindsight, I would’ve done things a LOT differently upfront had I even considered it would eventually grow the way it did.

    1. Justin, believe me that lesson was learnt the hard way – same as you have. My first model started off as a garage project and went on to be used by 600+ users. Unfortunately some of my early work had things like “Calculated Column1” or “Measure1″ which I could not take out since they were being used in reports. Ugh! Nasty. Of course I know better now…even in the training class we conduct, I don’t let my students leave names like Calculated Column1”.

  5. Great article Avi. The more detailed explanation of lookup tables with data tables is just what I needed. This section of the book is dogeared for me. You guys are awesome.

  6. Hi Rob,

    Thank you for an outstanding book. I currently use heavily formatted sheets for reporting purposes, so historically I have used tables for input instead of pivot tables or charts. I am also aware of PowerView and powerBI but for some of our reporting, we need more flexibility. I know that I can turn my pivot table into cube formulas to move around, but it seems that these formulas are often times hard to intuitively understand. What do you say to someone who is wanting the benefit of a data model, but finds referencing pivot tables cumbersome?

      1. Hrm, you don’t like pivots or cube formulas as output eh? What would be ideal, in your eyes? Structured References for PivotTables would be ideal. I posted some pretty robust code at Chandoo’s blog a while back that should fit the bill:
        http://chandoo.org/wp/2014/10/18/introducing-structured-references-for-pivottables/

        I’m in the process of turning this (and a lot more besides) into a commercial add-in that I’ll be launching with my book. The book isn’t specifically targeted at the PowerPivot end of the spectrum, but a lot of the tools I cover are just as helpful on PowerPivot PivotTables as they are on traditional ones. Including a revised implementation of Slicers that will also let you do all sorts of crazy WildCard filtering against Pivots that you simply can’t do out of the box. I even have someone at MS using this on their huge OLAP pivots, because the native filter takes upwards of a minute just to populate on their massive dataset, and half the time disappears as soon as it finally comes up.

        Sneak preview at http://dailydoseofexcel.com/archives/2015/11/17/filtering-pivottables-with-vba-deselect-slicers-first/

          1. Part of the issue is that whatever solution I come up with is to be shared among my colleagues, so for troubleshooting it can’t be too complicated. The Cube formulas are also lacking for pivot tables that are not fixed in size. I’m still trying to think of a solution and if you feel their is something more appropriate or feel that my view is short sighted then I’d love to hear it. So far, I’ve been using Power Query to pivot some of the data (I know, I know), along with Custom formulas in M to steer the transformations, but I’m worried that this will be too slow and I’ve really had to control the data coming in. One major issue is that all of my data is coming from .csv files instead of a database. Thanks for the feedback.

          1. Basically I use the Structured PivotTable References approach when I need a report that doesn’t look like a PivotTable. I put all the source PivotTables I need in a hidden sheet (and sometimes I make up quite a few PivotTables that contain the various extracts/combinations that I need), and then just reference the Structured PivotTable References that my code autogenerates. These get updated on refresh automatically in the blink of an eye, and this allows me to do pretty much what I want with that PivotTable output. Seems pretty bulletproof to date, just like the inbuilt Structured Table References are. But it all depends on what you are trying to achieve.

            Granted, I use these references to overcome the limitations of the non-OLAP GETPIVOTDATA function, and I understand that you PowerPIvot folks have much more that you can do with CUBE functions. But I also imagine that those CUBE functions might not be as transparent as say a simple view of a PivotTable with a clever dynamic named range pointing at the bits you need.

            Feel free to reach out to me at [email protected] if you need help with my code, or if it doesn’t play nicely with PowerPivot Pivots.

    1. Jeff, I feel your pain too. YOu convert the OLAP to formulas, then you move the formulaes and you forget which measures apply to the formulas and which filters drive those formulas….

      What i do is create the pivot tables twice and keep a master copy as an original pivot table in a ‘reference’ tab. The other copy i convert to formulas and move around as you say with full flexability and insert comments to remind me which original pivot table the formula relates to. This helps when you need to recall which pivots/measures/dimensions the formulas originate from and which filters where targeted at those formulas by simply going back to the reference tab to find the original pivot. I find it works.

      Of course this solution is without opening a chapter on the benefits of PowerBI, or even the option of exporting the power pivot datmodel into SSAS for SSRS flexible reporting….

      1. Very good points as well. PowerBI might work if I could print off a 20 page report like I can in Excel, but then again, I need more flexibility in the design and formatting. As for the other solutions, I unfortunately don’ t have the resources for either option. Thanks for commenting

        1. I would use GETPIVOTDATA referring to parameter cells so everything updates automatically when you adjust the parameters. If the dataset isn’t too big you could also use SUMIFS with structured references to a table. Both approaches usually work fine for me.

  7. Excellent post guys and this book sounds exactly what I’ve been looking for. I’ve had a play (and been on the MS course) with Power Query, PowerPivot, Power View and Power BI and I am wanting to venture more down this road as it’s the future. Book ordered. Looking forward to its arrival.

    ps One problem with our organisation currently, is we’re still on Excel 2010 (except me and a few others) so I’m assuming a meeting with the Head of IT to get a Company-wide rollout out of Office 2013, or better still 2016 is on the cards and a must if we are to encourage self-service BI. Agree?

    1. Jon, enjoy the book when it arrives 🙂 Do leave a review on Amazon when you have read enough.
      On Excel 2010/Excel 2013/Excel 2016
      My take is – who knows how long it would take to deploy Excel 2013.
      Power Pivot & Power Query with Excel 2010 can still do serious damage. Check out some of Rob’s early posts (Excel 2010 era) and hear some of his stories. He was saving companies millions of dollars with Excel 2010+Power Pivot.
      So I say, let’s start the engines! Go as hard as fast as you can with Excel 2010. If you do start hitting roadblocks which can only be resolved with Excel 2013/2016 then by the virtue of all the good work you would have done – you should have tremendous support behind you for those moves.
      By all means, have the meeting with the IT head. But don’t hold your breath till IT rolls out 2013; start the good work now with Excel 2010.

      Power On!

      1. I find that the issue isn’t so much which version of Excel (2010/2013/2016) as the fight to get a 64-bit install on your machine. Without fail, you’ll run into the 32-bit limitation as soon as you try using real-world data. (Your time learning in 32-bit with example datasets like AdventureWorks, etc. isn’t wasted, though.)

        PowerBI Desktop has been very useful in this regard, as the 64-bit version will install without caring about the “bit-ness” of your Microsoft Office.
        (The one exception I’ve found is that it can’t read MS-Access .accdb files. There’s still some link that causes the 32-bit driver to be invoked… come on Microsoft! Any work-around suggested to date is kludgy, at best; and I’d never ask someone else to use those approaches on a recurring basis.)

    2. Disagree. It is possible to encourage self-service BI without using any version of Excel. This can be done with just PowerBI.com or with Power BI Desktop and PowerBI.com

  8. When working on very large data models with many tables/sources, I’ve often thought that it would be nice to be able color code the tables based on source (blue=SQL, red=Marketing’s SharePoint list, purple=Bob’s Excel file, etc.). I’ve never been able to figure this out and have even gone so far as to take screen shots of the model and drop them into PowerPoint and add colors there for a quick-reference guide. Anyone have any ideas on how it might be possible to do this natively in the PowerPivot window?

  9. When starting with data that is a “Frankentable” (lookup information and data information in one, massive table), is it best to clean and filter my data first, and then use the “Reference” option to create my lookup tables by branching out from my filtered data table?

    For example: I have “Equipment Manufacturer” as a column on my Frankentable. The report I am creating is only relevant to one manufacturer.

    The “old” technique would be to query my source (.csv) for the data table, filter the manufacturer, delete the lookup columns, and then query the same source for the lookup table, filter the manufacturer, and keep ONLY the lookup columns and then remove duplicates.

    Would my query run faster if I imported my source, filtered the manufacturer, referenced that query for the lookup table query, and then deleted the unnecessary columns in the lookup and data tables?

    Thanks, and let me know if I am being unclear.

  10. Please tell me it is possible to get the book “Power Pivot and Power BI” freely.I need the book without money because i am student………..

Leave a Comment or Question