Quick Data Prep Trick: Use a Flat Pivot!

February 11, 2010

Today I had a huge source table that looked like this:

Store Month Payment Type Some Number
A Jan Cash 10
A Jan Credit 12
A Jan Debit 13
A Feb Cash 14
A Feb Credit 17
A Feb Debit 9
B Jan Cash 21
B Jan Credit 12
B Jan Debit 6

That went on for about 100K rows.  Thing is, I didn’t care about the Payment Type, and never would.  (My data set was a little different than this, so it’s hard to explain why I’d never care).

What I wanted was a much smaller source table that collapsed the PaymentType column and aggregated Some Number:

Store Month Sales
A Jan 35
A Feb 40
B Jan 39

Which would end up being about 1% of the rows.

My first thought was to create another table in PowerPivot and then use formulas to aggregate data from the big table into the new, small table.

But how to populate the first two columns of that new table?  Just getting all of the distinct pairs of Store/Month was going to be tedious work – I had > 200 stores!  I was contemplating VBA macros.

Then it hit me:  Flat Pivots!  (Kasper and I had been talking about them today for another reason – I think he’s going to share that soon).

What’s a Flat Pivot?

Flat PivotIt’s a new feature in Excel 2010.  Once you insert a PivotTable, you can set it to appear flat.

Thing is, I forget how you do that, because PowerPivot makes it SUPER easy at Pivot creation time, as shown here at right.

Drag Store and Month onto Rows, and heck, throw Sales into the Values area!  That yielded a PivotTable like this one below:

       Flat Pivot Results 1

Gotta get rid of those pesky subtotals now.  On the Design tab of the ribbon is the button I need:

No Subtotals
Which yields this as my Pivot:

Flat Pivot No SubtotalsPowerPivot Paste New Table ButtonCopy that to the clipboard and switch back over to the PowerPivot window.

On the main ribbon tab, click the big fat Paste button:

That yields a new table in PowerPivot:

Results Table in PowerPivot Window

It has all the unique combos of Store/Month, AND it already has the aggregated numerical column!  No formulas needed.

Took about 2 minutes.  Keep this in mind, it will save you a TON of time someday.

Hey, your PowerPivot Window is Green!!!

Why yes.  Yes it is.  Jealous?  Cough cough, GREEN with envy?

Don’t go looking for the setting.  It’s not there yet.  This is an improvement in recent builds of the addin.  You might also have noticed that the PowerPivot ribbon visuals are a little crisper.

Coming soon to a desktop near you :)


Jewelry Mashup for Alpha Geek Challenge

February 11, 2010

PowerPivotBookHello everyone. Rob invited me to do a guest post to talk about my Wives vs. Girlfriends entry in the Alpha Geek Challenge. That is a politically incorrect and provocative title for a real data analysis mash-up that I did with PowerPivot.

I spent the last 33 days locked in a room finishing five manuscripts for books about Excel 2010. One of those books will be PowerPivot For the Data Analyst from QUE. In the course of writing that book, I had to generate a lot of sample data sets to use as examples.

Some interesting things along the way:
1) I had some daily sales data for 2 ladies boutique stores. One is in a mall. The other is in an airport located 10 miles away. 3 years of mall history, only 2+ years of airport history because that store opened after the mall location.

2) Beginning theory: when there are rain delays, people have more time in the airport, so they buy more.

3) Beginning theory: when it gets hot, people go to the mall.

WeatherUnderground4) I knew that Rob had found weather data from somewhere, but as I started searching around, I could not find any good free weather databases. I did find a page a WeatherUnderground.com where you could retrieve weather statistics for one day for one city. Luckily, both of these stores were in the same city, so I only had to gather data for one city.

Unfortunately, getting three years of data meant visiting this web site 1095 times! Excel VBA lets you use the brute force approach, so while I was typing a chapter on one laptop, I set up a VBA macro on the other laptop to run an Excel Web Query over and over and over, changing the date in the URL each time. As each web page would come back to Excel, I would use VLOOKUP to pluck out the rainfall and the high temperature for the day.

5) Getting the data into PowerPivot was easy. Two linked tables. One relationship by date.

6) In the PowerPivot grid, I added some calculated columns with DAX:

  • To group the temperature data into buckets, I used a column called “Highs in the” and a DAX calculation of =INT(Weather[High Temp]/10)*10.
  • To group the daily dates into weekdays, I used =FORMAT(Sales[Date],”ddd”). This returns Mon, Tue, Wed, etc. You have to use Colin’s trick from the FAQ to actually sort those into custom date sequence. (Thanks Colin!)
  • Back in Excel, I had used the range version of VLOOKUP to assign classifications based on rain. The rainfall in inches was classified into “0-None”, “1-Trace”, on up to “4-Hurricane”.
  • I did not realize it when I started, but as I was doing DAX Measures, I found the need to have some concatenated fields so that I could do a count distinct. =CONCATENATE(Sales[Location],Sales[Date]) and =concatenate(Sales[Location],Sales[WeekdayName]).

7) I used the “four chart” layout in PowerPivot. I ended up using only three of those charts in the contest entry.

8) The important Measure that was enabled by DAX was “Sales by store by day that falls into the current filter context”.  There were some days in 2007 where only one store was open, and many days where 2 stores were
open. DAX Measures let me calculate total sales / (# of stores open on this day). That DAX Measure is: =sum(Sales[Net Sales])/Countrows(DISTINCT(Sales[LocationDays]))

9) During discovery, I used slicers, but for the contest entry, I needed different charts to have different filters, so I changed the slicers back to report filters so that one chart could be for the airport and another chart could be for the mall.

Surprising results:
As rainfall increases, sales plummet the airport. I guess that when people are freaking about because of flight delays, they are not busy shopping.

Chart1

Higher temperatures mean lower sales at the mall. When I was a kid, we would flock to the mall because they had air conditioning. Now, the people buying these high-end handbags probably already have air conditioning and no longer need to head to the mall.

Chart2

As I was looking for other trends, I ran one by weekday at the airport. Friday was the peak sales day during the week, probably indicating that business travelers are picking up a piece of jewelry on their way back home to their wife. But; I was surprised to see that the one day which beat Friday was Sunday. Sunday? If the business travelers are buying gifts for their wives on Friday, WHO the heck are they buying gifts for on Sunday? One theory is noted in the chart. In reality, that Sunday spike is probably from business women who are accessorizing up before heading out for their business trip. It would probably be interesting to break that Sunday data by hour to see if the sales are early in the day when people are heading home from weekend trips to the beach or on Sunday afternoon/evening when the business commuters are heading back out.

Once I had reached some interesting conclusions about the data, I spent some time formatting the workbook to not look like Excel. I got rid of a lot of the extra stuff that is standard on Excel charts, but then added some graphics (the rain is a pattern fill on the chart area in the left graph and the Delay Delay Delay graphic is a picture fill applied to the plot area. To use pictures in individual columns of the other charts, you have to click on a column twice. Once to select the series, and then a second time to select that one point. Then, use Format, Fill, from Picture.

I suppose the surprising PowerPivot lesson here was that the slicers were really helpful to me when analyzing the data, looking for interesting trends. Once I had decided on the message that I wanted to convey, though, the slicers had to go.

Thanks to Rob for allowing this guest post!


Calling for Guest Posts!

February 11, 2010

 stephen

“This must be a fashionable fight, it’s drawn the finest people.”

-Stephen

Last week, in in one of the comment threads, several of you were all over me with suggestions and questions, in a very good way.

And I think that’s to be expected, even though I know my way around this stuff pretty well.  I mean, “I can take the Pepsi challenge with that Amsterdam stuff any day.”  (That’s for you, Kasper.  And Pulp Fiction fans everywhere.)

But no one can really know every trick, every optimal method.  Excel and PowerPivot are both just too deep for one person to master everything.  And this is why community interaction is so much fun!  Everyone has something to teach, to suggest, or even to ask – and all of those activities enrich us all.

With all that said, it’s time for me to stop hogging the microphone.  If you have something cool you want to share – a data set you used, a technique you discovered, something you saw me do but you think you can improve it, let me know.  I’d be happy to post it here.  If you have your own blog, you can of course post it there, too – I’m just offering my site as a way to get more exposure for your ideas.

So, drop me an email if you have something in mind.  This is a standing offer.

And don’t worry, I’m not going to stop posting.  You can’t keep me away from this stuff :)