skip to Main Content

Resuming from Last Week

Last week’s post left off here:  most BI still relies on the storage layer (SQL or its many cousins) to be the brain, and storage layers are terrible brains because they were designed for storage, not for analysis.

This week, I hope to explain the following points:

  1. Storage brains are siloed brains.
  2. Storage brains are slow – in two ways.
  3. Using Power BI doesn’t help much unless you use it properly

Let’s “tease” point #3 real quick…

It’s kind of a shame that the narrative flow “buries” point 3 near the end, so let’s give you a preview of that right here, up front, before circling back to the sequential agenda.

It's Very Easy to Misuse Power BI as a Visualization or Reporting Tool. Harness the Data Model for the REAL Power.

In last week’s article, I talked about the rise of the dumb reporting tools – the ones that depend on the storage layer to be the “brain.”  Power BI is very, very different.  It’s SMART.  It has its own brain (called the Data Model, among other names).

But in order to be a smart tool, FIRST it ALSO has to have everything the dumb tools had!  You still need to be able to draw charts and tables with your data, which is the core capability of the brainless reporting tools.

And that sets a trap.  Power BI offers an amazing superset of what the brainless tools could do, BUT…  you can absolutely still use it like the dumb tools.

We’ll circle back to this.  Now let’s go talk in detail about why SQL (and storage systems in general) are terrible brains for analysis, reporting, and BI in general.

1) Storage brains are siloed brains.

Siloed Business Applications "Under the Hood"Enterprise and Line of Business server applications don’t like to share. For good reasons, it’s best to run them on different servers, with separate storage systems underneath each one.  So if we look inside one of these “siloes” for a moment, it looks a little bit like the figure here.

Because each application “owns” its respective storage layer, there’s no single database you can ask questions of.  The database brain underneath Application A (CRM, for instance) has zero access to the stored data in Application B’s storage.  The database software, and the machine it runs on, is inherently tightly coupled to its disks.

But even in cases where the storage is less tightly-coupled, and Database A can see the data in Database B, it won’t know what to do with it.  The storage schemas are completely different, id’s and keys are incompatible, etc.

So every single system in your organization EXCLUSIVELY spews out reports from its OWN domain.  Since you can’t run SQL queries over data that is stored elsewhere, your CRM gives you reports about…  CRM stuff.  Your accounting system gives you your current financials.  HR system gives you a view of your people.  And you’ve got MANY line-of-business systems, each giving you snapshots of what’s going on in those individual siloes.

Oh, you need to see a more integrated, end to end view of your company’s performance?  Tough luck.  Here’s a hundred different reports.  Happy hunting!

Siloed Business Applications Lead to Siloed Reporting, But Your Business Needs End to End Visibility!

“Fine, just unify the databases into one!”

Enterprise Data Warehouse:  One Database to Rule Them All (and in the Darkness Bind Them)“Just wait for the Enterprise Data Warehouse!”  “We’ll get everything loaded into a single storage system ANY DAY NOW.  And then we’ll be able to use SQL across the siloes!  Finally, we will have visibility!”  Yeah, cough cough, sure.

This is one of the all-time “howlers” – a lie so big, no one believes it can BE a lie.  And like all good lies, it starts with truths.  First, yes, of COURSE it’s a good idea to create a copy of your applications’ data, and use the COPY for all of your BI needs rather than directly querying the original underlying storages.  Second, of COURSE it’s a good idea to be storing point-in-time snapshots of your data, since the application systems only really care about “today,” and will happily forget important details from yesterday (or five years ago).  And finally, of COURSE if you’re ever going to have end-to-end, cross-silo visibility into your data, you have to unify your keys/identifiers/schemas, so that all of the data speaks the same dialect (ex:  all the data “agrees” on what a customer id looks like).

Enterprise Data Warehouse:  Always Three Years from CompletionThose are all Good Things, and they should always be in progress.  We do this stuff for a living, at organizations ranging from mid-market to the Fortune 100, and we absolutely LOVE it when there’s a good data warehouse waiting for us!  But there are problems, the most obvious of which is the “wait for it” part.  The Enterprise Data Warehouse (EDW) is never “caught up” with the proliferation of systems and relevant data sources.  You can add a single LOB application to your organization MUCH faster than you can onboard its data into the EDW.  So even in organizations which HAVE a strong EDW in place, 20-40% of the “important” data is still not yet captured in said EDW.

Of course, even if you DID somehow get the EDW “caught up,” you’d still have to deal with the Slow problem.  That sounds a lot like a transition to the next section, doesn’t it?  Well, it is.

2) Storage brains are Slow – in two ways.

First of all, storage brains are slow to RUN useful business queries.  Since storage brains have to be concerned about the tiniest little individual details (like, a single row) AND they have to be able to update/write to ANY individual row at any time, they can’t be properly optimized for speed when it comes to aggregate-level queries.  Aggregate queries are basically the bread and butter of the business – we don’t care (usually) about a single transaction, but instead care primarily about trends and totals (even though those trends and totals themselves CAN get pretty “drilled down,” which we will see in a moment).

To illustrate this point when I was speaking at a recent Developers conference, I created the following data set.  Three different tables, simulating 1) an organization’s digital advertising data (say, from Facebook or Google)  2) their website visitor tracking  and 3) their e-commerce system.  Three consecutive steps of the digital funnel, in other words.  And all definitely tracked by different operational systems.

SQL Can't Do This.  Tableau Can't Do This.  Power BI CAN.

Left to Right:  Ad Display Data, Website Data, and eCommerce Transactions
(Roughly 10% of our Ad Impressions Yielded Clicks to Our Site, and 10% of Those Visitors Bought Something)

This is a heck of a challenge for SQL, especially when we start asking it meaningful business questions.  Remember, SQL can’t even BEGIN to do this until all the data is stored in one place – which by default, it certainly is NOT.  But even if we allow that the EDW already has all of this, it’s still gonna be SLOW.

Check out this Power BI report:

Yeah in Theory You CAN Do This With SQL or Tableau, But You'll Have to Wait for It.  Power BI will do it in an eyeblink.

This. Is. Insane.
(Once you understand what it’s doing).

In human terms, Power BI is doing the following in order to show us this report:

  1. Find out how much we spent on each type of ad (1-6) – for each Quarter, each Year, each Month, AND each day of the week.
  2. Find out how much revenue we made – based on which type of Ad the customer originally clicked – again for all the same time “coordinates” (Quarter, Year, etc.)
  3. Divide the results of step 2 by step 1 – again for every single intersection of “coordinates,” both time and Ad type – to return a ratio known as ROAS.
  4. Now repeat the entire process – steps 1-3 – for the corresponding timeframe one year prior
  5. Subtract step 4 from step 3, then divide that difference by step 4
  6. Do all of this across many millions of rows – in different tables
  7. Display the MANY results – in a report that has has approximately 700 distinct final answers.

So…  how long do you think the Power BI “brain” has to think on this?  Do the lights flicker in the building while it runs?  Can we go to lunch – or on vacation – while we wait for it to complete?

Nope.  My computer was positively BORED by this challenge.

Power BI crunched hundreds of millions of rows into 700 distinct results without the CPU even noticing.
That’s… a sub-second spike of CPU – all the way up to the lofty heights of… 40% utilized.
(Yes, that’s all it took for Power BI to crunch and display steps 1-7 above across this hefty data set)

Even better:  I did NOT need an EDW to do this!  That’s right, I loaded the data from the three different systems directly into Power BI.  No three year waiting period before getting started.

But assuming you HAVE an EDW, try running the queries (plural) that this will take you in SQL or similar languages.  I’ll sit tight.  Send me a telegram and tell me how long it took.

“Slow to Run” isn’t even the worst of the two Slow problems, however.  No my friends – the queries are also Slow to Write.

SQL Queries are Slow to Write

Dr. Seuss clearly waited on some SQL reports to be written.  See his other seminal work, "Stacks and stacks of DAX!"
Dr. Seuss Clearly Data Modeled

First of all, not everyone can write SQL statements.  Yes, I know that Business Users do indeed sometimes learn SQL, but they have to be a special kind of proficient in order to pull off an equivalent to what I demonstrated above.  And…  even if a biz user IS that good at SQL, are you going to let them directly “ping” the EDW with their own handwritten queries?  Probably not.

So, you have to hand this task off to a dedicated IT pro.  Someone with the required skill and responsibility.  These people do exist of course, because if you have the prerequisite EDW, well, someone had to build that thing, didn’t they?  So there’s at least one such person lurking about.

They’re probably pretty busy, though.  I mean, someone like that tends to be expensive and rare, and they usually have a LOT on their plate.  So part of the “slow to write” is just “waiting for the right person to engage with you.”

And then you have the communication gap.  The biz user needs to explain to the “SQL God” what it is that they need.  This takes some time – emails, meetings, phone calls.  And has a less than 50% chance of being accurate on the first go-round.  And the SQL in question isn’t something the pro is gonna just rip through in five minutes.

It’s not uncommon for this process to stretch into WEEKS in the real world.  That meets anyone’s definition of “slow.”  In fact, one of our company’s single biggest historical consulting projects was to replace a multi-week SQL query-writing bottleneck with a data model, enabling the biz users to answer their own questions in minutes rather than waiting weeks.  And guess who hired us to do the work?  That’s right, the IT Director who was playing the role of “SQL God.”  It was wearing HER out, too – and fortunately she was in a position to seek a better solution.

3) Power BI is NOT a Visualization Tool, and it’s NOT the New SSRS

It's Very Easy to Misuse Power BI as a Visualization or Reporting Tool. Harness the Data Model for the REAL Power.

Back to the previously-teased topic…  its time has finally come!

Sadly my friends, this is a common workflow we still see with Power BI:

  1. Launch Power BI
  2. Connect to a database, load a table of data
  3. Start making charts

No no no no NO.  We didn’t come all the way through the 1980’s, 1990’s, and 2000’s to go right back to doing things the way we always did.

Anyone who describes Power BI as a VISUALIZATION tool is almost certainly following the workflow above, or expecting OTHERS to be using it that way.

And… anyone who describes Power BI as a REPORTING tool is almost certainly following the workflow above, or expecting OTHERS to be using it that way.

In that incomplete mindset, the workflow then progresses past step 3 and takes some really familiar turns…

  1. Launch Power BI
  2. Connect to a database, load a table of data
  3. Start making charts
  4. Immediately realize that you can’t make the charts you need to make
  5. Go looking for a table of data that DOES allow me to make the chart
  6. Fail to find such a table
  7. If I’m an IT type, I go make the required table (or view) in the storage layer – and I’ve fallen right back into the “storage as brain” trap of three decades ago.
  8. If I’m a biz user, I close Power BI and go looking for multiple reports that I can export to Excel

To be clear, steps 1-6 are CORRECT usage of Power BI!  It’s just steps 7 and 8 that are the mistake.

Instead of steps 7-8, the RIGHT decision is to ENRICH THE DATA MODEL.  Even if all you’ve done is load a single table, you already HAVE a data model.  Power BI Desktop creates a blank data model when you first launch the app, and everything you load goes into said data model from the start.  You just don’t have a particularly GOOD data model yet.

What does it mean to enrich the data model?  Well, I’m glad you asked.  At a high level, it entails a number of potential things, depending on the situation:

  • Creating Measures – oh my goodness, this is SUCH a mind-blowing capability.  It’s central and crucial to the entire story.  When I want to assess how advanced a Power BI user is, the first thing I look for is measures – are they using them at all?  (Often, no – it’s not like Power BI Desktop advertises in flashing neon “YOU SHOULD BE WRITING MEASURES.”).  And if so, how sophisticated are the measures?  This tells me nearly everything.
  • Loading additional tables – like in my Power BI example above, where I’ve loaded Ad Display data, Website Visitor data, and eCommerce data – as SEPARATE tables!  (As opposed to waiting around for someone to “make” me a table in SQL that somehow FrankenSplices the three into one grotesque, lumbering monster).
  • Using relationships to link said tables together – in order for my multiple, separate tables to play a “symphony of data” with one another, yep, I’ve got to link them together.  Which often requires additional tables (lookup, or dimension tables) that you normally wouldn’t think of.

It’s no accident that THIS is what our Power BI Foundations class is primarily about – those three things.  Whether you’re taking our public workshop version, or we’re coming to your company to teach it privately, that’s the core focus of the agenda.  Teach you the power of data models.  Show you that Power BI is FAR more than a reporting and visualization tool.

It's Very Easy to Misuse Power BI as a Visualization or Reporting Tool. Harness the Data Model for the REAL Power.

If You Haven’t Built Something that Looks Like This Yet, GOOD NEWS:  Mind-Blowing Capabilities Still Await You!

So… Don’t Do it “Like That”

  • If you hear someone trying to pigeonhole Power BI as “a viz tool,” or “just another reporting tool,” push back.
  • If your Power BI data models don’t look somewhat similar to the above image, it’s time to dive deeper.
  • If someone is saying that Tableau and Power BI are “clones,” ask them about Tableau’s data modeling capabilities (hint:  Tableau fakes that it has a data model, but ultimately, it completely lacks an equivalent to Power BI’s data model, and is just a slightly-smarter version of the 1980’s dumb, storage-dependent tools).
  • If you find yourself writing (or waiting on) tons of SQL to be written in order to give you what you need in Power BI, you might not have discovered data modeling yet.
  • If IT is using Power BI to “pump out reports” like they did with SSRS, with no biz user intervention in the creation of the reports (AND the data models behind them), your org has not yet tapped into the cultural transformation that it needs in order to remain competitive.  We can help you with that, btw.

Thanks for reading all of this, folks.  Honestly I originally set out to write ONE article.  Then split it into two when I saw just how lengthy it was going to be.  And then STILL, I “cut” several topics from part two in order to keep THIS installment halfway-reasonable.  So I’ve got a few more in the hopper that I’ll circle back to this summer Smile

Rob Collie

Rob Collie

One of the original engineering leaders behind Power BI and Power Pivot during his 14-year career at Microsoft, Rob Collie founded a consulting company in 2013 that is 100% devoted to “the new way forward” made possible by Power BI and its related technologies. Since 2013, PowerPivotPro has rapidly grown to become the leading firm in the industry, pioneering an agile, results-first methodology never before seen in the Business Intelligence space. A sought-after public speaker and author of the #1-selling Power BI book, Rob and his team would like to help you revolutionize your business and your career.

This Post Has 17 Comments
  1. Fantastic post, Rob! I appreciate having learned the fundamentals from P3 that enable me to deliver value every day instead of waiting 3 years!

  2. Silos. Without proper ventilation can build up dust and lead to “Dust Explosion.”
    “Ventilate” your data!!

  3. Suggesting the PowerBI is the ultimate brain is disingenuous at best, even misleading. Now understand I love PowerBI, but like the proverbial carpenter give a man a hammer and everything becomes a nail. (An old shop teacher even taught us how you could nail in a screw!).

    PowerBI is wonderfully powerful, but still lacks basic capabilities that it needs to solve for in the longer term (and perhaps will):
    – it needs to implement a security layer, to allow user based access rules.
    – since it is ultimately architected as a workstation product, it will suffer scalability issues.
    – it needs to be able to create calculated members in dimensions other than measures.
    – The “model” must be shareable across multiple PowerBI dashboards.
    – PowerBI makes it tough to combine data at different levels of granularity. It’s possible, but you have to be careful about how you create your model.
    – PowerBI visualizations need more flexibility in orienting data. For example, it’s important to be able to put measures on the rows.

    Again, PowerBI is a great tool that tremendously advances the visualizations in this space. While it is a work in progress, everyone should consider how PowerBI can augment their reporting environment. And creating a proper “model” is a mandatory first step in building powerful dashboards.

    Lastly, dumping on some poor IT rube for creating a view of the data in SQL and creating a “FrankenSplice” wasn’t necessary. Stuff like that generally comes about because people can properly articulate what they want. Creating proper structures / relationships back in the storage layer (or in the analytical layer!!) means that data is properly joined and referenced. By all means, create multiple tables as described and get the job done. But take your solution back to IT, to see if there’s value in incorporating it into the general warehouse, especially if you couldn’t articulate the requirements in the beginning. Let someone else reap the benefits of your insight.

    1. Dave, given your objections (and the fact that Power BI does indeed do ALL of the thing you say it can’t), I can only conclude that you are viewing Power BI as a viz tool and not a data engine, which is precisely the misconception I was driving at in this article.

      Power BI grew out of SQL Server Analysis Services, the leading OLAP server on the market for many years. It is a robust server product, through and through, adapted to the desktop purely to give it a convenient development environment (and an easy/free entry point).

  4. Hi Dave,

    A few thoughts on your points.

    – Row-Level Security is (has been) available and can be implemented dynamically based on Azure Active Directory.
    – The Power BI Service is backed by Azure, and ready to Scale. (On-Prem Report Servers Also Available).
    – Sharing a common model across multiple reports and dashboards, as well as ad-hoc analysis? Done, Power BI Service Makes it Happen.
    – Creating standard entities for model building? Power BI Dataflows to the rescue.
    – Suggesting that handling differing granularity is “tough” might be a stretch. Sure, not a novice skill, but plenty of multidimensional modeling concepts are also “tough” before we learned them.
    – Displaying measure names on the row axis is a toggle switch in the visual setting.

    You’ve nailed the fundamental issue of communication break downs between the Business and IT. As you said, we rarely can articulate what we want or need until we realize it. I 100% agree. This is a shared problem, not an IT problem or a Business problem. So Power BI (used properly) can help us achieve a higher rate of iteration and collaboration between IT and the Business as a result of putting Powerful tools in the hands of more people.

    I love architecture as much as the next person, and advocate for it with our clients when appropriate. I’d personally prefer the investment in architecture be matched by a discovered value from analytics, rather than a hoped value from analytics. If we have to build the EDW before we can figure out what we should do with our data, we’ll never build the right EDW. Getting the business directly involved and equipped is how we overcome those hurdles. When the bell tolls, effectiveness with Business Intelligence is measured by the quality of our actions and decisions, not the quality of our databases and dashboards.


  5. Not having Azure Active Directory myself, I can’t comment on the security response. I’d like to see some real world scaled implementations (with TCO factored in) to understand how this grows up. 200 million rows is a good start. I plead ignorance to most of your observations, but I will of course push into it. Again, PowerBI is a great tool, I just hate it when the hype tends to get people over their skis.

    I’m still strongly biased towards an architecture that that separates (or can separate) the modeling from the visualizations. The current strategy is to hype self-service BI, and few laypeople have the patience or skillsets. (I’m not protecting turf, just trying to make sure the job get’s done right and quickly.)

    Measured against SQL, PowerBI and it’s modelling is a no brainer. Measured against an OLAP layer, requires more careful thought. I guess, the real message is your data needs a real OLAP model. If PowerBI is the modelling tool at hand, then of course, it’s the tool to use.

    I love the collaboration “potential” and putting more powerful tools in the hands of more people, but that alone won’t solve the problem.

  6. This hits the nub of a source of some confusion for me… I keep seeing advertisements for Power BI develop roles which stipulate that the candidate MUST be able to run ETL process through SSIS (Something i know nothing about but have a vague sense relates to the above)

    As far as i understand.. The Power Query component of Power BI is in itself an ETL tool so the ability to run ETL through SSIS is surely completely redundant?

    Is the correct response to such job ads “WTF are you talking about SSIS for, when Power BI can do that stuff so much better?” or am i misunderstanderizing something?

    1. YES YES YES! You are absolutely seeing a symptom of what I’m talking about.

      To many people, Power BI is just “the new SSRS.” The latter relied on SQL, and everyone just assumes PBI is the same thing.

      And if you think PBI=SSRS, then OF COURSE most of the work is at the SQL layer, and the “heaviest” version of that is SSIS.

      As things stand today, I would absolutely NOT apply for any PBI job with the word “Developer” in the job title, because that organization is tipping their hand in terms of how they view PBI.

      We ARE developers, of course. But it just so happens that TODAY, anyone who refers to us AS developers is 99.9% likely to be in the “missing the entire point of all of this” bucket.

      Lastly, there are some big advantages of using SSIS over Power Query, even in a world where Data Flows is starting to chip away at those advantages. I have nothing against SSIS. What I have a problem with is using the storage layer as the brain. And the jobs you are seeing are definitely still stuck in that mentality.

      Awesome comment/question, thanks for taking the time to submit it 🙂

      1. Au Cointraire! Thanks for replying to ME and for writing such an awesome and fun blog to begin with. I’m really just get involved with Power BI with an eye to doing some contract work in the area so to have my bullshit sniffer validated in this way gives me some confidence its slowly all sinking in 😉 – Keep up the good work

  7. From my perspective, there is 2 points in your post.

    First, the need of an OLAP tool. We can all agree that having an aggregation engine is great and that SQL is not good at it. The PowerBI engine (PowerPivot part) is awesome for that. DAX is not my favorite but not especially worse than SQL. I remember preferring MDX. But obviously, there is not many players in the OLAP field, which is why, IMO, no one cares about the brain. The reason is that Microsoft was so good with Excel + SSAS thanks to the deep integration between those products.

    The second part is about agility. EDW can be slow to evolve but that isn’t always the case (well, always it’s an IT project). Obviously, it’s not the good place to experiment. PowerBI is nice to complement and existing (and still evolving EDW) by defining new metrics and/or merging new data sources. I don’t buy the idea to start from raw data sources it would be too painful and slow (to deal with the data mess).

    Just from a real life example, what if your product id is different in the website and the e-commerce system? Will you make a bridge table just to find out that you haven’t the same than another analyst? Maintain it yourself forever? Would you not prefer a company wide table maintained by someone else (you would still add your own bridge sometimes).

    Another real example, as an analyst, would you like to use a data source where employees (to be removed from computations) are to be identified by a column smartly named “dummy_char_1”?

    The perfect mix is when you can do whatever you want (and PowerBI bringing you that) but relying on strong foundation for 90% of the works (that’s the EDW part). By the way, the PowerBI work by analysts is a good indication of what the EDW should prioritize (because there is demand).

    Regarding speed, your PowerBI example works well because, while there is a lot of data underneath, you are only making serious works on 2000 rows (number of days * numbers of ads types). You are aggregating fact tables then dealing with them. That isn’t much work (but still a wonderful job by PowerBI).

    Now, let’s say the transactions table doesn’t have the ads link but a session id. This would be insanely more complex. Let’s say you want to attribute any further non-ad sale to an ad (but only if it was its first purchase and first visits). Even harder. Not really hard to code in SQL (slower, but who cares if it takes 5 minutes in the middle of the night to update the EDW).

    At some point you need serious data shaping, and I’m quite sure M is not close to SQL on an analytical database. Having basic knowledge of DAX and good knowledge of SQL beats good DAX and good M knowledge.

    I still agree on your 8 year old post : but maybe I’m a slow to evolve 🙂

    1. So much to unpack! And agree with 🙂

      Yep, OLAP is the key. But most folks reading this have never heard of it (as you’ve pointed out). And come on, DAX is freaking amazing! If you preferred MDX, you are a member of a super-elite cadre – those who actually “mastered” MDX. Internationally I estimate this population at 1,000 people, tops. I never could even *learn* MDX, much less master it, and I was in charge of that Excel-SSAS integration you so appreciate.

      Raw data sources are a fine thing to start with in my book, but it depends on exactly how “raw.” Deciphering underlying ERP schemas for instance is a bit TOO raw for my taste. Starting with a bunch of exported CSV files, on the other hand, is FAR preferable to waiting on the EDW to catch up. Let the EDW catch up AFTER we’ve already been delivering business improvement. “Catch up” is the right phrase.

      Your point about only manipulating 2k rows (days and ad types) is factually correct under the hood, but that’s due to the “magic” of Power BI’s brain! It turns the real-time aggregation of millions of rows into child’s play. The point I was making here is that SQL (and other storage layers) don’t get to “cheat” – they have to deal with all the individual rows, and choke to death in the process.

      I still prefer good clean databases to ANY other data source, including PQ/M. It’s just that when we DON’T have the good clean db, we don’t wait. Ideally the db work can proceed in parallel with our pragmatic “need results now” approach. But it’s very rare that it makes sense for the business problem to WAIT on the db work.

  8. I’m bookmarking this article, as it articulates the problem very nicely when I’m beating my head against CHQ trying to get more data access. We, unfortunately, are mostly at the “run a SQL report, export to Excel, repeat for several different reports, then join those reports together in PBI” stage. I got temporary access to a couple of databases to try and make the point that this was practically criminal negligence, wrote out a report live, on the call, that handled one of these things with a refresh time in seconds instead of days, and they went “huh, cool” and took my access away.

    Here’s to hoping they’ll listen to you better than me!

  9. Thank Rob, wonderful post, super inspiring. I am facing the same situation that you mentioned in the post and keep fighting back to prove the power of data model inside Power BI which Tableau will never ever have a chance to win on this.

Leave a Reply

Your email address will not be published. Required fields are marked *