(Image Courtesy of Orbitz.com)
Disproportionately Popular International Destinations by US State – Can We Do Something Similar?
I haven’t done a DAX post in a long time, so I was thrilled to get some inspiration from an unlikely source – an Orbitz infographic designed to sell expensive vacations!
My wife loves to travel AND she loves data, so it’s no surprise that she showed me the infographic in question.
“Oh come on, California residents don’t visit the Philippines more often than any other country.” That was my first reaction. (Mexico is the overwhelming #1 destination, basically for every state… because Cancun. And because there are so many Mexican-Americans).
“Ah… but California residents DO visit the Philippines disproportionately more often than they ‘should,’ according to national averages!” That was my dawning second realization (and confirmed by the fine print of the Orbitz article, even though the article’s title suggests otherwise.)
So, how do we do that in DAX? Pretty simply, actually.
Step 1: Pick Your X, Y, and Z (State, Country, and Trips in the Orbitz Example)
“For each X, what is the most disproportionately popular Y, as determined by Z” Let’s think of it that way – in the Orbitz example, X=State, Y=Country, and Z=Trips (or Travelers, not quite sure which they used).
Your Z can be any measure, whereas your X and Y can be any columns (from any table, as long as that table is ultimately related to your data tables that produce the measure Z).
I’m going to use:
- X = Sales Territory Region (from the good old AdventureWorks data set)
- Y = Product Subcategory
- Z = Sales Dollars
But remember, you can use anything as your X and Y – demographic, time of day, sales rep, whatever, and anything as your Z – response time, inventory, failure rate, etc.
One more time, with feeling: this technique can be applied just about ANYWHERE. If you’re reading this, I promise you it has value for your business in at least one place, and probably in multiple.
Step 2: DAX Magic!
I performed this in six deliberately-separate steps/measures, so that it’s easier to copy and modify. And I’ve color-coded sections of the formula to represent X, Y, and Z:
- Orange Represents X (or the table that contains it)
- Blue Represents Y (or the table that contains it)
- Green Represents Z
[This Subcat Pct of ALL Sales]
CALCULATE([Dollar Sales], ALL(Territories)),
CALCULATE([Dollar Sales], ALL(Territories), ALL(Products))
[Region Sales Total]
:=CALCULATE([Dollar Sales], ALL(Products))
[Expected Sales Based on Avg]
:=[This Subcat Pct of ALL Sales] * [Region Sales Total]
[Delta Pct from Expected]
:=DIVIDE([Dollar Sales]-[Expected Sales Based on Avg],
[Expected Sales Based on Avg] )
TOPN(1,ALL(Products[Subcategory]), [Delta Pct from Expected])
& ” +”
[Delta Pct from Expected] )
TOPN(1,ALL(Products[Subcategory]), [Delta Pct from Expected], 1)
& ” +”
[Delta Pct from Expected] )
English Descriptions of the Formulas
There’s zero shame in just copy/pasting the pattern/formulas above without bothering to understand them. But if you’re curious, here’s a quick description of each:
- [This Subcat Pct of ALL Sales] – across all Territories combined, what % of Sales does this Subcategory represent? (And I’ve used ALL on the entire tables of Territories and Products in this case, rather than just the specific columns for Subcategory and Region, because I still want to be able to slice the final pivot or dashboard by other columns like Category and get consistent results. Trust me – I did it the other way first and got strange results that weren’t trustworthy).
- [Region Sales Total] – what’s the total dollar sales for this entire Region? (And again, using ALL over the entire Products table rather than just Subcategory).
- [Expected Sales Based on Avg] – based on the two previous measures, if this Subcategory sold precisely on its global average in this Region, this is the Dollar amount it would have sold.
- [Delta Pct from Expected] – compares what it actually sold in this Region to that “expected” dollar amount.
- [Popular] – this awesome text measure finds the name of the most disproportionately-popular Subcategory (the first CALCULATE) and appends that together with the [Delta Pct from Expected] value of that same Subcategory (the second CALCULATE). Note how both CALCULATE’s iterate over Products[Subcategory] – via TOPN in the first case and MAXX in the second. So, even though our final pivot/dashboard won’t have the Products[Subcategory] field displayed on it, we’re still looking at each one individually and evaluating them against the others.
- [Unpopular] – exactly the same as [Popular], but we turn TOPN into “Bottom N” via that last optional parameter (, 1), and we replace MAXX with MINX.
As hinted above, we can now add slicers to the pivot and see what we can do…
Unfiltered Version – Showing Results for All Time and All Parent (Product) Categories
Zero In on Just 2003, and We See Fewer Examples of Clothes and Accessories in the Unpopular Column
(That’s Actually Expected, Since We Didn’t Sell Those Categories In 2001-2002)
Further Zero In on Just Accessories – Note How No Accessories Showed Up in Popular Before We Filtered
Clear All Slicers and Drag CalendarYear Onto Rows to See How Things Change Over Time – COOL!
Download the Files!
Starting this month, we’re gonna be a lot more diligent about giving you the files behind our posts. Generally speaking, it’s just been an omission, rather than sensitive data or such – because we’re almost always using fake data to document the technique anyway. We’re trading email addresses to download the example files – we think it’s a pretty good trade 🙂
Get Your Files
Awesome Results, Dull Pivots?
Sigh… how I wish it were easy to slap conditional formats on text measures. I’m pretty sure I could make it work with a “formula” version of conditional formatting, but I’ve always hated those, and on top of that… I don’t think I can accomplish Data Bar or Color Scale via formula. So I’m not going to slog through it.
So let’s take this over to Power BI Desktop and see what we can do, shall we?
Just Import the Power Pivot Workbook
Remember, you don’t start from scratch, you can “grab” your existing Power Pivot workbook into Power BI Desktop:
Import Your Power Pivot Workbook Into Power BI
Note that you can also go the reverse direction (build in Power BI Desktop, then connect Excel to it) – whether using the PowerBI.com cloud service, or Matt’s nifty desktop macro.
Ta-Da! Interactive! (Go ahead, touch my PBIX. Right here. In the browser. It’s fine.)
Link to fullscreen version: Click Here.
How’d I Do It? (The Short Version)
First of all, HUGE thanks to our very own Reid Havens for taking my first couple PBIX reports and making them a LOT more appealing. All those years spent learning to make Excel reports look nice eventually got me to someplace decent, but the Power BI canvas represents a “reset” for me. The final results can look amazing of course, better than Excel for sure – it just takes a different sort of “knack.” As of today, Reid has it. I do not. But I’ll get better… mostly by watching him, I suspect.
Second, I assigned cities (via calculated column) to each Region in the model, so that Power BI’s maps would have someplace to “anchor” regions like “Northwest.”
Third, I grabbed the Chiclet Slicer from the Power BI Custom Visuals Gallery. 100% free (and easy to add to your PBIX file). This let me assign icons to slicer tiles.
I had to find/create icons for the Categories of course (and the Regions), but that’s totally worth it.
And of course, the awesome Tooltips dropzone accepts my text measures, which is freaking awesome:
Tooltips Dropzone for the Win!
Pretty darn cool, I’m gonna stop here and see what you folks have to say in the comments. (I had a rant here that I wanted to share about the Card visualization, but I want to make sure I fully understand my own complaint before sharing it – measure twice, cut once, as carpenters say).