Continuing the series on The Great Football Project…
When we last left off, recall that I had used Excel-linked tables to clean all of the “no name” players out of the DimPlayers table, creating a CleanPlayers table to take its place.
I pretty quickly realize that I still have two problems:
- Hey, not all of my remaining players have BirthName’s! (First Names)
- I don’t have a FullName column, which is gonna be awkward when I build reports
Turns out that I can solve both problems with a single calculated column. Time for some…
DAX Formula Language
DAX is the formula language of PowerPivot. DAX really could have been named “Excel Formulas++” but we went with the catchier name. Because really, DAX is just Excel’s formula language with a number of nifty improvements.
I use Ctrl-RightArrow to jump over to the rightmost edge of the table, since PowerPivot supports commonly-used Excel keyboard shortcuts, to the Add Column:
And then I start to type the following formula:
Let’s stop there briefly, because there are a few things worth noting already:
- I’m starting my formula with the typical “=” Excel syntax
- It uses the IF function, and you can see from the function signature under the formula bar that it’s the same IF function from Excel
- All column names in PowerPivot are wrapped in square brackets
- Once I enter the open square bracket, I get AutoComplete against all columns in the current table!
OK, let’s just finish the formula. I notice from the dropdown that there is a [NickName] column, and I remember that in this db, most of the time you want to use [NickName] over [BirthName]. So I’ll fall back to using [BirthName] only when there is no [NickName]:
=IF([NickName]="",[BirthName]&" "&[LastName],[NickName]&" "&[LastName])
(I used the & operator instead of =CONCATENATE for readability reasons.)
I rename the resulting column to be FullName and I’m done. I’ve added an expression-based column to a state of the art, in-memory, column-oriented store (the PowerPivot db), but I don’t really think about it because there was no SQL syntax, no MDX, no VB expressions. I just added a calc column using Excel syntax.
Relating this to the FactEvent table
The FactEvent table, that contains the plays themselves, does not contain player names – it contains player ID’s. This is good db design, but it’s typically a pain for Excel users, since Excel’s features pretty much only work on single tables.
Excel users denormalize multiple tables into one by using functions like =VLOOKUP() that get the job done, but are tedious to write, are often slow to execute on large datasets, and result in large spreadsheets. Plus they have to write one such formula for every column they want to grab from another table.
PowerPivot really shines here, and I suspect Excel pros will adopt it for some of their work based simply on this one capability: PowerPivot is a version of Excel that supports relationships between tables.
Conceptually Similar to VLOOKUP
If I want to think about relationships in a similar way as VLOOKUP’s, I can. I go to the “big” table – the one with the most rows, the one I want to augment with data from other tables. In this case, FactEvent. On the Table ribbon tab there’s a Create Relationship button:
This brings up the following dialog:
I’ve filled in the right values in each dropdown, and done so thinking as if I was writing a VLOOKUP (well, ok, a VLOOKUP where I can match on any column, not just the first column of the lookup table!)
- …and if I happened to get the two tables reversed in my head, no worries, PowerPivot figures that out for me.
- And for my next trick…
- Pretty cool and all, but the real fun starts after I have the relationship is in place. Fireworks
- in tomorrow’s post.