PowerPivot Data Cleaning: Three Options (with a football angle of course)

October 30, 2009

In the previous PowerPivot football project post, recall that I’d encountered some dirty data.  Even in the world of sports stats, data quality is a huge problem:

Where's 007?

Some players have names, others are just numbers.  And we can’t edit/delete these records in-place, for the PowerPivot compression reason I outlined last time.  I mentioned three options to address this problem, so what are they?

Option #1a – Pre-filter the data during import (graphically)

In my previous post, I used the point-and-click method to grab my database tables out of SQL Server.  I performed no filtering whatsoever, even though there IS a really simple, Excel-style UI for filtering the data included in the import wizard:

PowerPivot Excel-Style Query

In many cases, that will work for you – you can deselect certain values and set simple rules like “equals” and “does not equal.”

In my case, though, I don’t think that works.  I just have too many bad values to uncheck them individually.  And there isn’t a rule for “is text” or “is not numeric.”  (Something we should add next time I’d think).

So, moving on to:

Option #1b:  Pre-filter the data during import (by writing SQL)

That same import wizard also contains a tool that lets you write (or paste) you own SQL statements:

SQL Query

Which I’m pretty sure I could figure out how to do if I researched “filtering non-numeric values using SQL” on the web.

But for now, let’s stick with the “I’m just an Excel user” vibe. So…

Option #2:  IT Gives me Clean Data to Begin With

Does that sound lazy of me?  :)  Not really.  Did you see the post earlier today on Enterprise BI Mashups?  One of the quotes in there is about how IT focuses on data sources and data quality, freeing business users to build better reports and analyses.  Specialization.  Division of labor.

And while PowerPivot does not require it, it’s still very much a best practice.

Option #3:  Clean the Data in Excel with a Linked Table

Let’s go with the most Excel-style solution.  PowerPivot has Clipboard support!

So…  I can select the entire DimPlayers table (ctrl-A), copy it (ctrl-C), switch over to the Excel window of the same workbook, and paste it.  Then it’s just a matter of sort/filter, and delete to get a table that only contains names:

PowerPivot Cleaned Table in Excel

OK, so how to get it back into PowerPivot?  Do you see the “Create Linked Table” button in the Excel ribbon above?  That’s a feature of our addin that copies the selected Table from Excel into PowerPivot, and then links the PowerPivot table back to the Excel table.

(Note that in order to do this, you have to format the Excel data as a real Table.  I use Ctrl-L)

Here’s the cleaned table back in PowerPivot:

PowerPivot Linked Table

See the little link icon on the sheet tab?  That means I can edit the data back in Excel, flip back over here, and the edits automatically flow through (actually, we replace the entire table, but the effect is the same).

Maintaining the Table in Excel

There are some lists that are just meant to be maintained by a human being.  For those, this linked table feature is perfect – open the spreadsheet every now and then, update the list, PowerPivot refreshes, and then you re-publish.

This local-editing-thru-Excel also works well in my case, since I am not getting any new data from Stats (although…  please please please, Stats, give me a free license – we will make beautiful music together).

The one case where this won’t work very well is if the Players table data changes all the time on the backend.  I don’t want to manually update my workbook all the time.  Remember, scheduled automatic report refresh is a big PowerPivot feature.  In those cases, options 1 and 2 are the ways to go.

Next Football Post


Article on Enterprise BI Mashups

October 30, 2009

Given the extremely mashup-friendly nature of PowerPivot, I found this article fascinating:

http://searchsoa.techtarget.com/news/article/0,289142,sid26_gci1372670,00.html# 

Especially liked this part (my emphasis):

"Mashups won’t succeed in a BI context unless it’s fun, unless its highly interactive and the user says ‘wow that’s totally easy to use let me use that so I can build my reports,’" said Kobielus. "If it’s approached where IT is forcing self-service on reluctant users, I think it will fail. It’s got to be user enthusiasm that drives this."

"The key is to break up the work a bit," said Robert Eve, EVP of marketing at data virtualization vendor Composite Software. "We work on the data plumbing side. You have your data-oriented people – your data architects. Then you have people more focused on the business consumer and the application usage to work on the visualization side."

It sounds like someone has been reading the PowerPivot playbook :)

Actually, I prefer to think that these are very insightful people independently reaching the same conclusions we did – it has to be light, easy, and fun.  And no matter how good the tools are, the users still need good, clean, accurate business data, so the role of IT becomes more clearly-defined.  Division of labor and specialization always makes for greater efficiency.


Honk if you hate site previews!

October 30, 2009

You know those annoying popups whenever you hover over any link or image on the site?  That serve no purpose whatsoever, other than to clutter your screen and make your status bar flicker all the time?  Argh.

I just now figured out how to turn them off.  Good riddance.


PowerPivot ranked #1 favorite feature from SharePoint conf

October 30, 2009

File this in the “I couldn’t agree more but am maybe a touch biased” department:

http://blogs.technet.com/sharepointexperts/archive/2009/10/26/sharepoint-conference-2009-favorite-technologies-unveiled.aspx