skip to Main Content

In the previous PowerPivot football project post, recall that I’d encountered some dirty data.  Even in the world of sports stats, data quality is a huge problem:

Where's 007?

Some players have names, others are just numbers.  And we can’t edit/delete these records in-place, for the PowerPivot compression reason I outlined last time.  I mentioned three options to address this problem, so what are they?

Option #1a – Pre-filter the data during import (graphically)

In my previous post, I used the point-and-click method to grab my database tables out of SQL Server.  I performed no filtering whatsoever, even though there IS a really simple, Excel-style UI for filtering the data included in the import wizard:

PowerPivot Excel-Style Query

In many cases, that will work for you – you can deselect certain values and set simple rules like “equals” and “does not equal.”

In my case, though, I don’t think that works.  I just have too many bad values to uncheck them individually.  And there isn’t a rule for “is text” or “is not numeric.”  (Something we should add next time I’d think).

So, moving on to:

Option #1b:  Pre-filter the data during import (by writing SQL)

That same import wizard also contains a tool that lets you write (or paste) you own SQL statements:

SQL Query

Which I’m pretty sure I could figure out how to do if I researched “filtering non-numeric values using SQL” on the web.

But for now, let’s stick with the “I’m just an Excel user” vibe. So…

Option #2:  IT Gives me Clean Data to Begin With

Does that sound lazy of me?  :)  Not really.  Did you see the post earlier today on Enterprise BI Mashups?  One of the quotes in there is about how IT focuses on data sources and data quality, freeing business users to build better reports and analyses.  Specialization.  Division of labor.

And while PowerPivot does not require it, it’s still very much a best practice.

Option #3:  Clean the Data in Excel with a Linked Table

Let’s go with the most Excel-style solution.  PowerPivot has Clipboard support!

So…  I can select the entire DimPlayers table (ctrl-A), copy it (ctrl-C), switch over to the Excel window of the same workbook, and paste it.  Then it’s just a matter of sort/filter, and delete to get a table that only contains names:

PowerPivot Cleaned Table in Excel

OK, so how to get it back into PowerPivot?  Do you see the “Create Linked Table” button in the Excel ribbon above?  That’s a feature of our addin that copies the selected Table from Excel into PowerPivot, and then links the PowerPivot table back to the Excel table.

(Note that in order to do this, you have to format the Excel data as a real Table.  I use Ctrl-L)

Here’s the cleaned table back in PowerPivot:

PowerPivot Linked Table

See the little link icon on the sheet tab?  That means I can edit the data back in Excel, flip back over here, and the edits automatically flow through (actually, we replace the entire table, but the effect is the same).

Maintaining the Table in Excel

There are some lists that are just meant to be maintained by a human being.  For those, this linked table feature is perfect – open the spreadsheet every now and then, update the list, PowerPivot refreshes, and then you re-publish.

This local-editing-thru-Excel also works well in my case, since I am not getting any new data from Stats (although…  please please please, Stats, give me a free license – we will make beautiful music together).

The one case where this won’t work very well is if the Players table data changes all the time on the backend.  I don’t want to manually update my workbook all the time.  Remember, scheduled automatic report refresh is a big PowerPivot feature.  In those cases, options 1 and 2 are the ways to go.

Next Football Post: Formulas and Relationships! >>

Rob Collie

One of the original engineering leaders behind Power BI and Power Pivot during his 14-year career at Microsoft, Rob Collie founded a consulting company in 2013 that is 100% devoted to “the new way forward” made possible by Power BI and its related technologies. Since 2013, PowerPivotPro has rapidly grown to become the leading firm in the industry, pioneering an agile, results-first methodology never before seen in the Business Intelligence space. A sought-after public speaker and author of the #1-selling Power BI book, Rob and his team would like to help you revolutionize your business and your career.

This Post Has 7 Comments
  1. Rob, I’m really enjoying this meaty PowerPivot blog.

    Option #4: Bring back the data cleaning tools that were dropped from Gemini! OK, I admit that I haven’t a clue what was in the “Data Cleaning” tab, but I bet it was a better alternative to the options you discussed. Your last comment hit home:

    “The one case where this won’t work very well is if the Players table data changes all the time on the backend. I don’t want to manually update my workbook all the time.”

    And it’s this kind of concern that triggered my comments about linked tables on the PowerPivot blog. Another possible option would have been to import the table into Excel via an external query and “cleanse” the imported table so that on refresh, the table is updated with the cleansing “functions” automatically applied. However, even if external data sources were supported for Excel tables in a PowerPivot workbook (the functionality doesn’t work in CTP2), there would have to be a mechanism on the server to allow automatic periodic refreshes of the Excel table and subsequent updating of the linked PowerPivot table.

    1. Oh, and yeah, bringing back the idea of the data cleaning tools – yes yes yes, I agree. That said, we realized that we could probably spend an entire release on just those (probably more) and we had to cut them completely rather than compromise other functionality in v1 (while still not delivering enough data cleaning to be credible).

      Automated data cleaning in the Excel environment remains my #1 software white whale.

  2. Hi Rob,

    I have one question regarding this
    “There are some lists that are just meant to be maintained by a human being. For those, this linked table feature is perfect – open the spreadsheet every now and then, update the list, PowerPivot refreshes, and then you re-publish.”

    Is it possible to create a powerpivot report based on SSAS and on a excel sheet on sharepoint. When the published powerpivot report will be (automaticly) refreshed he will load the data from SSAS and the data from excel (new data is manually added to the sheet). Will this work ?

    Thanks,
    Kasper

    1. At the moment Kasper, I don’t know of a way to automatically pull data from an Excel workbook during scheduled Data Refresh. At least, not without writing a little code.

      I have high hopes for the Excel Services REST API – that gets us part of the way there. At the moment, though, we’re still missing a little bit of glue between that and our Data Feeds import method (which IS refreshable). I had hopes of that getting worked out before RTM, but it’s looking less likely now. And that’s a bummer.

  3. Hi Rob,

    Just tried the Linked table function and it works great! Thanks for the CTRL-L tip. Too bad i get an error while creating a relationship. I posted the error at the gemini client newsgroup, don’t have to bother you with the technical details.

    Kasper

Leave a Comment or Question