skip to Main Content

power pivot to power bi

A Power BI Technique Mined from the Power Pivot Archives

Below, you will find one of our all-time favorite Power BI techniques. In the ten (yes ten) years we’ve been operating this site, we’ve written over 1,000 articles on Power BI.  Hard to imagine - even for us - but true.

Years ago, we first wrote up this technique in the context of Power Pivot – Power BI’s cousin and predecessor, which we like to call “Power BI in Excel.” 

Since the two products share the same brains (DAX and M), this technique is more relevant today than ever. Everything below is 100% legit for Power BI – the same data model “shape,” the same DAX, etc. – but now you can use it to power up your Power BI visuals, and not just PivotTables.  Enjoy! Smile


Post by Rob Collie

Repeat Customers in Power Pivot / DAX:  By Number of Transaction Lines, By Number of Distinct Orders, and Allowing for Cross-Year Return Customers

In 2004, There Were 2,561 Customers Who Bought Something in the Southwest.
But How Many of Those Were Repeat Customers?  Depending on How We
Define “Repeat,” We Can Get at Least Three Different Answers.

A Right Turn at Albuquerque…

I sat down today to write about “Disconnected Cube Formulas” – yes, you heard that right.  A brand new technique that I think has some pretty exciting (yet admittedly narrow) applications.

But along the way, like Bugs Bunny, I ended up doing something at least as interesting.  So let’s do that one first.

Setting Up the Problem

I have four relevant tables:  Territories, Customers, Calendar, and Sales:

image

The first three are Lookup (aka Dimension) tables, and Sales is a Data (aka Fact) table.

Active Customers is a pretty easy formula:

  [Active Customers]:=

   DISTINCTCOUNT(Sales[CustomerKey])

 

We use DISTINCTCOUNT against the CustomerKey column in the Sales table, rather than COUNTROWS of Customers, since the latter would count all customers, whether they bought something or not.  And a simple COUNTROWS of the Sales table would treat every single line of the Sales table as if it were a different customer, which is obviously not true, because some customers DO buy more than one thing.  DISCTINCTCOUNT of the CustomerKey column in the Sales table gives us precisely what we want.

And when we filter the pivot to 2004, we only get the count of customers who bought something in 2004.  OK, so far so good.

Repeat Customers – Version 1

If we define “Repeat Customer” as “A Customer Who Has More than One Line in the Sales Table,” we end up with a formula like this:

  [Repeat Cust v1]:=

   COUNTROWS(FILTER(Customers, CALCULATE(COUNTROWS(Sales))>1))

So, we use the FILTER function to temporarily create a copy of the Customers table (that exists only during the calculation of this measure, behind the scenes!) and that temporary version of Customers will only contain rows for which CALCULATE(COUNTROWS(Sales))>1.

Whoa!  Why CALCULATE of COUNTROWS instead of just COUNTROWS???  It’s very strange, I agree, to see CALCULATE without any “filter” arguments.

The short version here is that, without the CALCULATE, COUNTROWS(Sales) would not “respect” each row of Customers, and count up rows in Sales regardless of which customer they are from.  Which would result in basically every row of the Customers table passing the test, even if they haven’t bought anything.

By applying CALCULATE, we allow the Sales table to respect the filters coming from the Customers table.  The fancy phrase for this is “promoting row context to also become a filter context,” and.

This is NOT something that you normally have to know about the CALCULATE function.  I’d file it firmly under Things You Might Learn in Year Two.  I didn’t “get” this either, for a long time, and that didn’t stop me from using CALCULATE to death.  So for now, you can just follow the pattern happily Smile

(More on this topic is located in Matt’s excellent post.)

Version 1 Sucks Though Smile

Sometimes, a single customer buys multiple products at the same time.  If they buy multiple products in a single order, and we never see them again, we probably don’t want to count them as a Repeat Customer, do we?

A Customer With a Multple-Line Order

From the Sales Table:  Customer 18759 Bought 3 Different Products as Part of One Order.
And We Don’t Want to Count Multi-Line Orders as a Repeat Customer.

To fix this, we want to count customers who “appear” in multiple different orders!  So, something like this…

  [Repeat Cust v2]:=

   COUNTROWS(
      FILTER(Customers,
             CALCULATE(DISTINCTCOUNT(Sales[SalesOrderNumber]))>1
            )
            )

It still uses the “CALCULATE to promote row context into filter context” trick, but now, rather than enforce “more than one row,” we want to enforce “more than one different value for SalesOrderNumber,” which is equivalent to “more than one order.”

Version 2 Only Counts Repeats Within the Filtered Timeframe

The only “problem” with version 2 is that, when we filter to 2004 for instance, a customer must have had multiple orders in 2004 alone.  What if they placed an order in 2003, and then a single order in 2004?  If you want them to be counted, we need to do something like…

  [Repeat Cust v3]:=

   COUNTROWS(
             FILTER(Customers,
                    CALCULATE(DISTINCTCOUNT(Sales[SalesOrderNumber]),
                              ALL(Calendar)) > 1    &&
                    CALCULATE(COUNTROWS(Sales)) > 0
                   )
             )

The yellow highlighted part says “a customer must have appeared in multiple orders, ignoring any filters on the Calendar table.”  In other words, even if the pivot is filtered to 2004, a customer will “pass” the yellow test even if all of their prior orders occurred in other years (thanks to the ALL).

The green highlighted part just says “a customer must also appear at least once in the Sales data within the current time frame.”  Since we are NOT using an ALL inside that CALCULATE, the Year=2004 filter from the pivot WILL be respected (as always).

The “&&” in orange is just the AND operator, which says “a row in the Customers table must pass BOTH the yellow test and the green test.”

***Update to V3 Formula

In the comments section, Ruve1k correctly pointed out that the formula above would return incorrect results if I set the Year to 2003.  A customer who bought once in 2003 (and that was her first purchase ever), and then bought again in 2004, should NOT count as a repeat customer in 2003.

Totally true, and the reason why I didn’t contemplate that possibility?  It has everything to do with this post NOT being the post I originally set out to write.  In the context of that other post (which was forecasting related), filtering the pivot to the “max” year was the only thing you would EVER do.  Not so with Repeat Customers of course.

So here’s the new and improved version, that now prevents the “all time” portion of the formula (the part originally in yellow above) to never look past the current Calendar context the pivot is set to.

COUNTROWS(
          FILTER(Customers,
                 CALCULATE(DISTINCTCOUNT(Sales[SalesOrderNumber]),
                           FILTER(ALL(Calendar),
                                  Calendar[Date]<MAX(Calendar[Date])
                                 )
                           )>1 &&
                 CALCULATE(COUNTROWS(Sales))>0
                 )
          )

Version 3 Rocks Smile

Repeat Customers in Power Pivot / DAX:  By Number of Transaction Lines, By Number of Distinct Orders, and Allowing for Cross-Year Return Customers

Ta-DA!

Note how much more “restrictive” Version 2 turned out to be than v3!  In the Northwest region alone, there are 7x as many Repeat Customers according to version 2 as according to version 3.

Download the Workbook

Download the Workbook Here

Rob Collie

One of the original engineering leaders behind Power BI and Power Pivot during his 14-year career at Microsoft, Rob Collie founded a consulting company in 2013 that is 100% devoted to “the new way forward” made possible by Power BI and its related technologies. Since 2013, PowerPivotPro has rapidly grown to become the leading firm in the industry, pioneering an agile, results-first methodology never before seen in the Business Intelligence space. A sought-after public speaker and author of the #1-selling Power BI book, Rob and his team would like to help you revolutionize your business and your career.

This Post Has 22 Comments
  1. Rob, it seems that your v3 calc would treat a customer who purchased again in 2005 as a repeat customer in 2004. I think ALL(Calendar) is a bit too liberal and you will need to restrict it to dates prior to the maximum date in your current context.

    1. Boy do I dread the “Ruve1k” comments. Always pointing out my slip ups 🙂

      In this case though, 2004 is my max year, so I didn’t worry about 2005.

      1. So if the customer bought once in 2003 and once in 2004 then he’d be counted as a repeat customer in 2003 as well because of his future purchase in 2004. The problem will show up anytime your filter context is not your max year.

        Funny, I have the exact opposite reaction to your posts and comments. I find them quite delightful. 🙂

        1. I understood that implication (about 2003), as well. I was just viewing this as a measure that I would only be using in the max year context.

          Which, of course, is not a valid assumption to make, most of the time.

          The “back story” to this post was that I was starting to write a post about forecasting scenarios, one where we’d use disconnected cube formulas (!) to let users type their own inputs into cells (rather than use slicers), and THAT scenario was obviously going to be focused on “most recent year” since it’s forecasting. So it ALL MAKES SENSE inside my weird little head, just not to anyone else 🙂

          It’s not that I dread hearing from you sir. It’s just that 99% of the time when I do, it’s because I slipped up. “Comment from Ruve1k” now immediately triggers the “oh boy what did I mess up THIS time?” thought. You yourself, are delightful 🙂

          All right, I will fix the formula 🙂

          1. Awesome exchange here. Really enjoy the interactions / responses to comments from Rob and others who post content, learn nearly as much as the original post. Cheers, Phil

          2. Formula is now fixed in the post (well, actually I added a new section, and left the old section alone, to reflect the evolutionary process). Uploading the new workbook now.

  2. I apologize in advance if this is nitpicking too much with the formula.

    But I believe the last version would still not show a repeat customer for, say, 2004 (and prior) if they made their second, repeat purchase on 12/31/2004. The part: Calendar[Date]<MAX(Calendar[Date]) would exclude 12/31/2004 from the filter. Maybe needs to be Calendar[Date]<=MAX(Calendar[Date]).

    The chances of this happening is very remote; but remote is not never. Main reason I bring this up is because it is these types of logic errors that can very hard to pinpoint. The bigger the solution, the harder to find.

    Thanks to all for the tips/comments. I read them all.

    1. Reuvain, is that you under another account? 🙂

      You are correct, sir. It *should* be less than or equal to, rather than less than. I’m sadly too tired and jammed up with other work at the moment to fix it up in the post and the workbook though. Hopefully I will remember and circle back in a day or two.

      Good catch. Seriously 🙂

  3. Excellent topic and great timing for me as I was contemplating a similar measure. I appreciate having the various versions presented. It is very helpful for a newbie like me to see the evolution and options.

  4. I am anxious to look at the technique presented here but I get an ‘Unable to upgrade the Data Model’ message while trying to convert it to 2013.

    For those in the comments with same situation, have you been able to update the workbook yet?

    Thx

  5. Hi! I am learning DAX and I have a “big” question related to the context of the topic (i would love to understand) … Why in this measure of new customer, they (Alberto Ferrari and Marco Russo) use Calendar[FullDate] <= MIN (Calendar[FullDate])

    [NEW CUSTOMER]:= CALCULATE (
    COUNTROWS (Customers);
    FILTER (
    Customers;
    CALCULATE (
    COUNTROWS (Sales);

    FILTER ( ALL (Calendar); Calendar[FullDate] 0
    )
    )

    When i see this (having in mind just one customer) i think that the MIN function is going to give me the earliest date So the dates before are 0 …

    Saludos desde Chile!

  6. I have a table of customer Order and Sales information. I want to identify customers that have not ordered in 18 months. The fields in the table are:

    Customer ID, Customer Name, Invoice #, Sales Date, Sales Amount, Order Date, Order Number, Purchase Order Number

    For any record where the Sales Date occurs in September 2015, I want to evaluate the table to determine if the ordering customer, using Customer ID, had a sale between March 1st 2014 – Aug 31st 2015. If there is no sale within that time frame I would like a value in a new field = “New in Sept 2015”

    For any record where the Sales Date occurs in October 2015, I want to evaluate the table to determine if the ordering customer, using Customer ID, had a sale between April 1st 2014 – Sept 30st 2015. If there is no sale within that time frame I would like a value in a new field = “New in Oct 2015”

    I imagine I would use some variation of this formula, correct?

  7. Hi Rob, Can you give me more presice definition of your “Repeat Customer” measure defined as
    “A Customer Who Has More than One Line in the Sales Table,”? I find it very similar to “Returning Customers” defined as “customers who bought at least one of the selected products in the past (no matter when)” by Alberto Ferrari and Marco Russo. I computed both measures and they give different results, though the results are very close.

  8. Every time I see a formula for a complex question from Rob, I wait for comments from people (especially from Marco / Alberto) for any slip-ups done by Rob. Not that Rob makes mistakes always. His formulas always more readable/friendly, at least Excel pros won’t feel scary who travel on the DAX road.

    The formulas for Lost Customer/Returning Customers/Recovered Customers from Alberto/Marco are still scary to touch for most of the Excel Pros. I am just waiting for some one to make those formulas refined one day. It is too much for me to try my self.
    But there’s difference in the ingredients they (Rob&the Italians )use for the same formula. I call them “Rob’s way” and “Italians way” in my mind.

Leave a Comment or Question