skip to Main Content

power pivot to power bi

A Power BI Technique Mined from the Power Pivot Archives

Below, you will find one of our all-time favorite Power BI techniques. In the ten (yes ten) years we’ve been operating this site, we’ve written over 1,000 articles on Power BI.  Hard to imagine - even for us - but true.

Years ago, we first wrote up this technique in the context of Power Pivot – Power BI’s cousin and predecessor, which we like to call “Power BI in Excel.” 

Since the two products share the same brains (DAX and M), this technique is more relevant today than ever. Everything below is 100% legit for Power BI – the same data model “shape,” the same DAX, etc. – but now you can use it to power up your Power BI visuals, and not just PivotTables.  Enjoy! Smile


Post by Rob Collie

From Last Week’s Client Work

Last week a client asked us to solve a somewhat unusual problem:  given any two lists of Twitter followers, tell us how many followers “overlap” between the two lists.

Two Lists of Twitter Followers:  How Do We Find the Overlap Using Power Pivot / Power BI / DAX?

How Many of Han Solo’s Followers Also Follow Leia Organa, and Vice Versa?
(Randomly-generated Twitter handles are funny.  I particularly like “@Gommo” and “@Xxfok”)

Loading the Data:  Using Power Query

Let’s use Power Query to perform the import this time, both because we’re using PQ a lot more around here now that we have Power Update, and because we’re gonna need PQ for the more complex steps later.

Note that all of the steps below are performed using Excel 2013.  (I find Power Query to be a bit too clumsy in Excel 2010.)

Power Query, aka Power BI Data Import

Importing from a Table Using Power Query:  Step 1
(Unchecked “has headers” because of the “Han Solo’s Followers” Row)

Now we trim that top row (“Han Solo’s Followers”) off:

Power Query, aka Power BI Data Import Power Query, aka Power BI Data Import

Trimming the Top Row Off of our Data in Power Query

And then, we use the “new” top row as the header row:

Power Query, aka Power BI Data Import

Which gives us this in the Power Query editor:

Power Query, aka Power BI Data Import

Data is Looking Good.  Note that We’ve Renamed the Query from “Table3” to “Han”
(Also note the list of steps that Power Query has saved up for us, to be replayed automatically on refresh!)

Set Your Data Types in Power Query!

image

Heed the following advice now, or suffer the wrath of a painful bug later.  The bug is the result of, ahem, a Power Struggle between the Power Tools (Power Query and Power Pivot), where each wants to be “in charge” of things like table name, column names, and data types.

In short, if you are EVER going to change ANY of these, perform the change in Power Query, NOT Power Pivot:

  1. Table Name – change this in PQ’s Query Settings, as pictured above.  (Don’t right click and rename the resulting sheet tab/table in Power Pivot).
  2. Column names – right click and rename them in Power Query (don’t right click and rename in Power Pivot).
  3. Deleting Columns – again, delete them from the Power Query side, using the Delete key or the Remove Columns button on the ribbon.  Don’t delete on the PP side.
  4. Data types – use the Data Type dropdown on PQ’s Query Editor ribbon (pictured below).  (Don’t use the similar dropdown’s on Power Pivot’s ribbon).

If you use Power Pivot to change any of those, even once, chances are good that PP will now think it’s “in charge” of ALL changes to that table from that point forward, and then if you EVER change ANYTHING in Power Query (even things OTHER than table/column names or data types, like filtering out bad rows), you are suddenly confronted with this problem, for which in Excel 2013 the only simple “fix” is to delete your entire table from Power Pivot and start over again.  Blech.

So let’s set those data types now, just to be safe in case we need them properly set later, OK?

Power Query, aka Power BI Data Import   Power Query, aka Power BI Data Import

Set the Date Data Type for “Follow Date”, But the Other Column is Already OK

Now, load the query directly into the Data Model (aka Power Pivot):

Using Load To so that data lands directly in Data Model / Power Pivot

Duplicate and Modify the Query to Import Leia’s Followers

We need to run all of the same steps to import Leia’s followers, so rather than starting from scratch, let’s Duplicate the Han query:

Power Query, aka Power BI Data Import

And then we edit the duplicate query:

Power Query, aka Power BI Data Import

Again, we load the Leia table straight into the Data Model (Power Pivot), yielding:

Follower Lists in Power Pivot / Power BI / Data Model

Simple Solution Using Lookupvalue

Let’s do this the easy way first.  In the Han table, add this calculated column:

A Simple Answer to the Overlapping/Shared Follower Problem in DAX

[MatchingFollower] :=

LOOKUPVALUE(Leia[Follower],Leia[Follower],Han[Follower])

That results in mostly empty values, since most of Han’s followers are NOT also in Leia’s follower list, but there definitely ARE some matches:

A Simple Answer to the Overlapping/Shared Follower Problem

Even better, we can modify the calc column formula so that it returns 1 for matches and 0 for non-matches:

=IF(LEN(LOOKUPVALUE(Leia[Follower],Leia[Follower],Han[Follower]))>0,1,0)

So now we can use this column as the basis for a measure that counts overlap:

[Leia Followers Who Also Follow Han] :=

SUM(Han[MatchingFollower])

Then we can, say, add a Month column to the Han table:

image

And now we can pivot/summarize:

A Simple Answer to the Overlapping/Shared Follower Problem in DAX

Of course, we could obviously now do the reverse steps for Leia – add the LOOKUPVALUE column, the measure, etc., but we won’t show that here.

(Note that once we DO duplicate the formulas, we still end up with 211 shared followers, so the extra work wouldn’t yield much of an interesting result.)


Interested in Learning How to Do this Kind of Thing?

power query

Our New Online Power Query Course just launched this year. You’ll learn Power Query from the best in the business, two Excel MVPs Ken Puls and Miguel Escobar. We’ve included 7 Hours of Self-Paced Video over 31 Courses, with lots of example files, and a Digital Badge powered by Acclaim.

CLICK HERE to claim your spot.


The Hard (aka Dynamic/Flexible) Way

Let’s say we don’t want to take the easy way out. We want to be hard on ourselves, for reasons that may become clear later.

We can alternatively approach this as if it were a classic “many to many” problem:

The Overlapping/Shared Follower Problem in Power BI / Power Pivot:  The Dynamic Way

Han Followers, Leia Followers, and a “Bridge” Table, MergedFollowers
(Which is the Union of All Han Followers Plus Leia Followers)

So, How Do We Create the MergedFollowers table?  Power Query of course!

Power Query, aka Power BI Data Import

Using Append from the Power Query Ribbon
(Creates One Long Table – Han and Leia tables “stacked” vertically)

Power Query, aka Power BI Data Import - Remove Duplicates

Then we Remove Duplicates from the Follower Column
(Removes 211 duplicate rows in our case)

Power Query, aka Power BI Data Import

Lastly We Rename the Query

Then we load into the Data Model, create the relationships, and we get:

The Overlapping/Shared Follower Problem in Power BI / Power Pivot:  The Dynamic Way

It is VERY important that the relationships point in the right direction.  Han and Leia should be your Lookup tables.  Don’t get that backwards or nothing subsequent will work.

Note that we ALSO should have removed the [Follow Date] column from the MergedFollowers table – we don’t need it, and it’s misleading:  for the duplicates, we’ve arbitrarily been “left” with the Han or the Leia follow date, not some sort of merge or average or whatever.  (Yes, I noticed this right before posting, that I’d left the column in by accident).

No Calculated Columns in the Hard Way!

Rather than adding calc columns and then summing them up, in this approach our measures can follow the Many to Many pattern, or at least a close cousin thereof.

First, let’s add some basic measures (calculated fields) for Han Followers and Leia Followers, as well as the overall merged population:

Han Followers :=

COUNTROWS(Han)

Leia Followers :=

COUNTROWS(Leia)

Total Follower Population :=

COUNTROWS(MergedFollowers)

Yielding this simple pivot:

The Overlapping/Shared Follower Problem in Power BI / Power Pivot:  The Dynamic Way

Now we add the “magic” measure…

Han and Leia Shared Followers :=

CALCULATE([Total Follower Population], Han, Leia)

Which, loosely translated to the extent I loosely understand it, is “give me the [Total Follower Population] but enforce matches from both the Han and Leia tables.”

And that gives us…

The Overlapping/Shared Follower Problem in Power BI / Power Pivot:  The Dynamic Way

WooHoo!  The Same Answer We Had From the Easy Way!
(Aren’t you glad we burned all that extra effort?)

Advantages of the Hard Way?

Simply put, the hard way is more Dynamic.

First off, as we add more tables, like the Luke table, we don’t want to have to add pair-wise calc columns for Luke-Han overlap and Luke-Leia overlap, and then later Lando-Han, Lando-Leia, and Lando-Luke (when we add the charismatic Lando and his millions of followers to the mix).

Of course, even in the Hard way, we DO still have to add pair-wise measures, so, um… yeah.  That’s not a tremendous savings.

Here’s where it starts to get interesting:

The Overlapping/Shared Follower Problem in Power BI / Power Pivot:  The Dynamic Way

 

If we have properties of followers, and those properties can be DIFFERENT, for a single follower, depending on their relationship with Han or Leia, the Hard way takes care of us.

For this example I just invented a column in each of the Han and Leia tables:

image image

Han and Leia Tables Now Each Have a Column for ActivityLevel
(Active = Engages Frequently, Dormant = Used to Engage, Went Quiet and Lurker = Never Has Engaged)

The formulas for those columns are not important – they are just randomly-generated.

The important thing is that a SINGLE USER can be a Lurker for Han but be Active for Leia, or vice versa, etc.

And our Hard Way measure lets us build a pivot like the one above, where a simple LOOKUPVALUE, being statically determined by “is this user in both lists”, will not help us.

There are a MILLION things we could continue on and explore at this point, like percentage overlap (which is just a simple DIVIDE measure), displaying the lists of users (also pretty simple), but wow, this is one LONG post already.  So let’s call it quits for now.

Rob Collie

One of the original engineering leaders behind Power BI and Power Pivot during his 14-year career at Microsoft, Rob Collie founded a consulting company in 2013 that is 100% devoted to “the new way forward” made possible by Power BI and its related technologies. Since 2013, PowerPivotPro has rapidly grown to become the leading firm in the industry, pioneering an agile, results-first methodology never before seen in the Business Intelligence space. A sought-after public speaker and author of the #1-selling Power BI book, Rob and his team would like to help you revolutionize your business and your career.

This Post Has 10 Comments
  1. Great post Rob. An alternative dynamic approach is possible with just the two tables of followers:

    Shared Followers:=
    calculate
    (
    countrows(Han),
    filter
    (
    Han,
    contains
    (
    Leia,
    Leia[Follower],
    Han[Follower]
    )
    )
    )

    Shared Followers Same Month:=
    calculate
    (
    countrows(Han),
    filter
    (
    Han,
    contains
    (
    Leia,
    Leia[Follower],
    Han[Follower],
    Leia[Follow Month],
    Han[Follow Month]
    )
    )
    )

    Shared Followers Different Month:=
    calculate
    (
    countrows(Han),
    filter
    (
    Han,
    contains
    (
    Leia,
    Leia[Follower],
    Han[Follower]
    ) &&
    not
    (
    contains
    (
    Leia,
    Leia[Follower],
    Han[Follower],
    Leia[Follow Month],
    Han[Follow Month]
    )
    )
    )
    )

    Not Shared Followers:=
    calculate
    (
    countrows(Han),
    filter
    (
    Han,
    not
    (
    contains
    (
    Leia,
    Leia[Follower],
    Han[Follower]
    )
    )
    )
    )

    Follow Month is a field in each table which is simply =eomonth([Follow Date],0)

    I would suggest you could also get around the need for measure pairs by holding two complete fact tables (substitute Han for Fact1 and Leia for Fact2) with all followers and the associated name (Han, Leia etc) in the table as a new field ie union them together. Naturally you would need to filter out where the names are equal in both tables at run time as comparing Han to Han would deliver obvious results. This approach is also VERY quick and removes the need for M2M relationships.

    If you also have a bridge table of all follower IDs (distinct list) and join to the fact tables Fact1 and Fact2 to Bridge (opposite way to your example), you can plot these measures on a pivot with follower on row and easily see where those followers drop into these measures. Fact1[Name] and Fact2[Name] as slicers and the measures will compare which ever name pair you select.

    Hope this helps give even more fun to this subject!

    1. Thanks DC!

      Your suggestions are similar, in a way, to the posts I did a long time ago on “CONTAINSX”:

      https://powerpivotpro.com/2014/01/containsx-finding-if-a-value-in-table-1-has-a-matching-value-in-table-2/
      https://powerpivotpro.com/2014/01/containsx-revisited-what-was-the-match/

      Perf-wise, my gut instinct was that the M2M approach was going to be faster than iterating with FILTER or SUMX (and text-parsing at each step), so I didn’t go down that road.

      Of course, I did absolutely nothing to confirm said instinct. (I don’t use DAX Studio or Profiler like *some* people I know, because hey, I’m super slow and stubborn to adopt new things – not a good thing, but it’s the simple truth.)

      The single fact table idea is compelling. Seems like I would still need some sort of disconnected slicer tables to properly leverage it, agreed? (Again, just speaking from instinct, not actual thinking).

      1. The real “bummer” about this problem-space is that we end up with Lookup/Dimension tables that are Data/Fact table-sized. The work I’m doing with our client involves 100k+ followers for each account for instance, and I expect performance to start degrading once we start getting into more interesting corners of the analysis.

        I was halfway expecting you to break out some sort of fancy SUMMARIZE or similar and tell me that’s 10,000 times faster, perf-wise.

        And actually, I *still* kinda think you may come up with something like that 🙂

        1. Ha! Well, 100k+ lookup, summarize, contains, crossjoiny thing. Sounds like a fun challenge! 100k unique values (well, actually more than that considering that it might be 100k+ for each) are going to provide a headwind to all of these approaches, I think. But, they’re the sort of stats we all like playing with to see which approach really shows some muscle. Of course, we should possibly dabble with 2016’s natural joins……err, should we…!!?

          Can you smash up a sizeable desensitised workbook for us to hack around with?

          1. Hey guys,
            Great examples of the impact data modelling has, really enjoyed it.

            When it comes to performance, strict denormalization is fastest here: one table + one measure = lightning speed 🙂

            Have a look: https://onedrive.live.com/edit.aspx?cid=de165ddf5d02daff&page=view&resid=DE165DDF5D02DAFF!10930&parId=DE165DDF5D02DAFF!107&app=Excel

            Example data is set to 10k for file size, but just check it out with 100k rows: In sheet “Parameter” cell B2 type in the number of the desired rows of the input data (will be generated automatically).

            Then just press: Data – refresh All

            But take care: 500k will consume 4GB RAM and take 1 Min to generate the input lists, 1 Mio will eat up 12 GB RAM (& 3 Mins to generate)! (But pivots still lightning fast) – So you’d better close other applications first before playing around with it 🙂

      2. Yeah, disconnected tables would almost certainly be required to drive the selections of the 2 (or more) names to be compared.

  2. Using names like “Han Solo” & “”Leia Organa” is too confusing. You could have just used 1 Male Name & another Female name, the reading would have been little easier. Also please use simple names 🙂

    1. OK, so how about next time I call them Grumpy and Picky? 🙂

      The idea here is to inject a little fun into the process, for everyone – readers yes, and also me. It took about 6 hours total to write that post and it’s good to have it feel less like work and more like fun.

      I understand that may not work for everyone, but I think the majority do prefer it, and certainly it makes it more likely that *I* will follow through and do a good job, so… fictional names from scifi and fantasy will continue 🙂

      I do sincerely appreciate the feedback. I’m just explaining why this one particular point is one I disagree with.

Leave a Comment or Question