As I have started working with clients around training/consulting, I am surprised by how often I find tables that have been flattened, i.e. the data tables have been combined (joined) with lookup tables, to produce a really wide table with everything and the kitchen sink in it. Maybe that’s to be expected; after all that is the modus operandi when working in Excel. That’s the first habit that I try to break. I would nominate this for the top 5 mistakes but that already has 6 on the list 🙂
Usually it is just a matter of changing how data is being pulled from the source to address this issue, and go from flat table to separate Data and Lookup tables (Star Schema). However at times, your data source itself may only have the flattened version of data. If you have little control/influence over the data source, you may be stuck with a flat table. Or not!
Flat to Star: Using DAX Query
The first thing that should come in your mind to deal with this should be Power Query, and we will cover that. But this time let’s try to use DAX as a query language for this purpose. Our sample data, is a flattened file which has sales data as well as product attributes. We would like to separate these into distinct power pivot tables.
The hardest part is getting started; for that watch the video. I will skip to the DAX query used, with a tip of the hat to Marco and Alberto from whom I have learned so much (and continue to do so).
EVALUATE: Evaluate is to DAX query as Select is to SQL query. Start with Evaluate, and use a DAX expression that returns a table – and that’s your DAX query. If that sounds scary, realize that you probably are already aware of many DAX functions that return a table – FILTER, ALL, VALUES to name a few.
SUMMARIZE: Summarize is like the SQL group by, and I’m using that here to get a distinct set of rows for my product table.
Once you have your DAX query results in an Excel table, you can even click “Add to Data Model” to send the query results back to Power Pivot as a table. Isn’t that cool!
You can indeed accomplish the same using power query. The main steps being to remove unneeded columns and then removing duplicates. Watch the video for more.
Data Gained, Data Lost
While feasible, you should be aware of the downsides of this approach, i.e. building your lookup (dimension) tables based on transaction record. The primary downside, is that you would only get data pertaining to the transactions that actually took place. For example, if there are products, customers, employees… that do not appear in any transaction in the period being considered, they would not show up in your lookup tables either. For one, this would mean you can never look for things that didn’t happen.