In the previous PowerPivot football project post, recall that I’d encountered some dirty data. Even in the world of sports stats, data quality is a huge problem:
Some players have names, others are just numbers. And we can’t edit/delete these records in-place, for the PowerPivot compression reason I outlined last time. I mentioned three options to address this problem, so what are they?
Option #1a – Pre-filter the data during import (graphically)
In my previous post, I used the point-and-click method to grab my database tables out of SQL Server. I performed no filtering whatsoever, even though there IS a really simple, Excel-style UI for filtering the data included in the import wizard:
In many cases, that will work for you – you can deselect certain values and set simple rules like “equals” and “does not equal.”
In my case, though, I don’t think that works. I just have too many bad values to uncheck them individually. And there isn’t a rule for “is text” or “is not numeric.” (Something we should add next time I’d think).
So, moving on to:
Option #1b: Pre-filter the data during import (by writing SQL)
That same import wizard also contains a tool that lets you write (or paste) you own SQL statements:
Which I’m pretty sure I could figure out how to do if I researched “filtering non-numeric values using SQL” on the web.
But for now, let’s stick with the “I’m just an Excel user” vibe. So…
Option #2: IT Gives me Clean Data to Begin With
Does that sound lazy of me? :) Not really. Did you see the post earlier today on Enterprise BI Mashups? One of the quotes in there is about how IT focuses on data sources and data quality, freeing business users to build better reports and analyses. Specialization. Division of labor.
And while PowerPivot does not require it, it’s still very much a best practice.
Option #3: Clean the Data in Excel with a Linked Table
Let’s go with the most Excel-style solution. PowerPivot has Clipboard support!
So… I can select the entire DimPlayers table (ctrl-A), copy it (ctrl-C), switch over to the Excel window of the same workbook, and paste it. Then it’s just a matter of sort/filter, and delete to get a table that only contains names:
OK, so how to get it back into PowerPivot? Do you see the “Create Linked Table” button in the Excel ribbon above? That’s a feature of our addin that copies the selected Table from Excel into PowerPivot, and then links the PowerPivot table back to the Excel table.
(Note that in order to do this, you have to format the Excel data as a real Table. I use Ctrl-L)
Here’s the cleaned table back in PowerPivot:
See the little link icon on the sheet tab? That means I can edit the data back in Excel, flip back over here, and the edits automatically flow through (actually, we replace the entire table, but the effect is the same).
Maintaining the Table in Excel
There are some lists that are just meant to be maintained by a human being. For those, this linked table feature is perfect – open the spreadsheet every now and then, update the list, PowerPivot refreshes, and then you re-publish.
This local-editing-thru-Excel also works well in my case, since I am not getting any new data from Stats (although… please please please, Stats, give me a free license – we will make beautiful music together).
The one case where this won’t work very well is if the Players table data changes all the time on the backend. I don’t want to manually update my workbook all the time. Remember, scheduled automatic report refresh is a big PowerPivot feature. In those cases, options 1 and 2 are the ways to go.