Data Warehouses: What Do I Know?
Well, I’m no expert. I learned Power Pivot and Power BI through Excel. What I know is from helping to build a SQL data warehouse. That is, I’ve worked with a data warehouse pro. So, BI pros or others, please add your observations in the comments!
Data Warehouse: the Promise.
In 2009 on Twitter, Rob Collie noted that the data model in Power BI and Power Pivot: “very much is meant to work in conjunction with, and be fed by, well-designed data warehouses.” (Power Pivot’s Impact on BI Pros). For me, a good data warehouse enables business pros to create reports without needing to think too much about the data model. Another benefit I see is the ability to deliver operational reports from transactional databases (ERP, CRM, financial systems). By design, these systems are for logging events, not for creating reports. And yes, even Salesforce is for logging tasks. Reports are secondary.
Data Warehouse: the Hype.
I’ve heard some idealistic claims from data consultants and Microsoft TSPs. For instance, data warehouses will soon replace Power Query and data modeling. Really? When I hear these claims, I tend to smile and nod. Why? Because, business needs and data sources evolve and the data warehouse lags behind these needs. Another benefit is having a single source of truth. For me, the real promise here is empowering business pros to get reliable data.
Data Warehouse: a Couple of Things.
- Star Schema. Data warehouses use a star schema. In a star schema, data falls into lookup tables and transaction tables. Or, for the data warehouse pro, dimensions and facts. Typically, transaction tables hold time and event-based data. Whereas lookup tables hold the descriptive details for contacts and accounts (attributes). For more info about star schema, see Data Model: Beast to Beauty. One lookup table can contain data that’s in multiple hops in the source database. For example, transaction type code will be one field and its description will be another field– instead of in a separate table.
- Historical Analysis. The purpose of a data warehouse is historical analysis, and this shapes design decisions. Since longer text fields won’t aggregate in a simple way, the first urge is to drop them.
- Platform Agnostic. The data warehouse doesn’t care how users connect. Users can use Power BI, Excel, Tableau, etc.
- Other Abstractions. Data warehouses have other abstractions that may confuse end users. For example, how they handle blank values in data (nulls) or how they handle transactions that don’t connect to lookup tables. And, the data warehouse may use internal IDs not found in the source. While I’m less concerned with these abstractions, they are good to be aware of. A data warehouse will involve a learning curve.
5 Practical Suggestions for aligning a data warehouse with Power BI.
(your mileage may vary)
To begin with, views are critical to Power BI. Views are versions of tables. Typically, these views are for specific purposes. While a good data warehouse person won’t adapt tables to Power BI, they will adapt the views. But you have to know what to ask for.
- First, depending on how your data warehouse handles nulls, your dates may not be continuous. If your data warehouse replaces nulls with 12/31/9999, the date table will have gaps. The view for the Date lookup should filter out this invalid value. While you’re at it, be sure that all of your date fields have sort columns. A Month Year column formatted Apr 2020 will need a Year Month column like 202004 that will sort.
- Second, naming conventions. Some platforms (Tableau, for example) can’t recognize spaces in table names and/or column names. Because of this, the data warehouse may use CamelCase for everything. While Gil Raviv has a query for fixing camel case column names, it’s best to fix this in a view. Similarly, prefacing every table name with the word Fact or Dim can make a data source alien to a business pro. Rename in the view. Always prefer clear human readable names for Power BI.
- Third, long text fields in transaction tables. Long text fields can snag performance in Power BI. But you may need them for operational reports. One solution is to truncate these fields to 256 characters or less. Another approach is to add an index to the transaction table and split the text off into a special lookup table. However, with this second solution, see below.
- Fourth, prevent one-to-one relationships. (Only you can prevent monogamy). With a great data warehouse, Power BI will set up the relationships between transactions and lookups automatically. What can block this automatic setup are tables where both sides have unique IDs. I prefer to add two invalid rows with blank data to the table that needs to be on the many side.
- Fifth, decide what to do about tables that may have multiple lookups to the same table. Your sales table may have a Sales Date and a Ship Date. Because Sales Date is the usual link between the tables, I prefer to rename Sales Date to Date Id (or whatever you Date key is called). This enables Power BI to create the main relationship without the business pro having to think about it. Other cases are when there is a credit account and a debit account. The user can always create a second inactive relationship to the other field.
Note on Relationship Auto-detection in Power BI.
Power BI can match up lookups to data tables. Automatically. Sadly, Excel Power Pivot can’t. Power BI recognizes lookups because they have a column with unique values. It links lookups to transactions on identical column names. Power BI will create one-to-many relationships. It does not attempt to create many-to-many relationships. Thank goodness! It will, however, create one-to-one relationships. Ruh oh. This is because it has no way to determine which side is NOT the lookup side. This problem is what the Fourth suggestion above remedies. What’s so tricky about one-to-one relationships is that it throws off auto-detection of all of the other relationships. And there are other results in the data model from these dreaded one to one relationships.
Microsoft’s platform is the world’s most fluid & powerful data toolset. Get the most out of it.
No one knows the Power BI ecosystem better than the folks who started the whole thing – us.
Let us guide your organization through the change required to become a data-oriented culture while squeezing every last drop* of value out of your Microsoft platform investment.
* – we reserve the right to substitute Olympic-sized swimming pools of value in place of “drops” at our discretion.