Siloed and Slow: What Happens When You Aren’t Using Power BI’s Data Model “Brain”

May 28, 2019 at 6:44 pm

Fantastic post, Rob! I appreciate having learned the fundamentals from P3 that enable me to deliver value every day instead of waiting 3 years!

May 29, 2019 at 2:16 pm

“Right,” said Fred.

🙂

May 29, 2019 at 9:28 am

Silos. Without proper ventilation can build up dust and lead to “Dust Explosion.”
“Ventilate” your data!!

May 29, 2019 at 10:29 am

Great article!

May 29, 2019 at 10:47 am

Suggesting the PowerBI is the ultimate brain is disingenuous at best, even misleading. Now understand I love PowerBI, but like the proverbial carpenter give a man a hammer and everything becomes a nail. (An old shop teacher even taught us how you could nail in a screw!).

PowerBI is wonderfully powerful, but still lacks basic capabilities that it needs to solve for in the longer term (and perhaps will):
– it needs to implement a security layer, to allow user based access rules.
– since it is ultimately architected as a workstation product, it will suffer scalability issues.
– it needs to be able to create calculated members in dimensions other than measures.
– The “model” must be shareable across multiple PowerBI dashboards.
– PowerBI makes it tough to combine data at different levels of granularity. It’s possible, but you have to be careful about how you create your model.
– PowerBI visualizations need more flexibility in orienting data. For example, it’s important to be able to put measures on the rows.

Again, PowerBI is a great tool that tremendously advances the visualizations in this space. While it is a work in progress, everyone should consider how PowerBI can augment their reporting environment. And creating a proper “model” is a mandatory first step in building powerful dashboards.

Lastly, dumping on some poor IT rube for creating a view of the data in SQL and creating a “FrankenSplice” wasn’t necessary. Stuff like that generally comes about because people can properly articulate what they want. Creating proper structures / relationships back in the storage layer (or in the analytical layer!!) means that data is properly joined and referenced. By all means, create multiple tables as described and get the job done. But take your solution back to IT, to see if there’s value in incorporating it into the general warehouse, especially if you couldn’t articulate the requirements in the beginning. Let someone else reap the benefits of your insight.

May 29, 2019 at 2:19 pm

Dave, given your objections (and the fact that Power BI does indeed do ALL of the thing you say it can’t), I can only conclude that you are viewing Power BI as a viz tool and not a data engine, which is precisely the misconception I was driving at in this article.

Power BI grew out of SQL Server Analysis Services, the leading OLAP server on the market for many years. It is a robust server product, through and through, adapted to the desktop purely to give it a convenient development environment (and an easy/free entry point).

May 29, 2019 at 12:38 pm

Hi Dave,

A few thoughts on your points.

– Row-Level Security is (has been) available and can be implemented dynamically based on Azure Active Directory.
– The Power BI Service is backed by Azure, and ready to Scale. (On-Prem Report Servers Also Available).
– Sharing a common model across multiple reports and dashboards, as well as ad-hoc analysis? Done, Power BI Service Makes it Happen.
– Creating standard entities for model building? Power BI Dataflows to the rescue.
– Suggesting that handling differing granularity is “tough” might be a stretch. Sure, not a novice skill, but plenty of multidimensional modeling concepts are also “tough” before we learned them.
– Displaying measure names on the row axis is a toggle switch in the visual setting.

You’ve nailed the fundamental issue of communication break downs between the Business and IT. As you said, we rarely can articulate what we want or need until we realize it. I 100% agree. This is a shared problem, not an IT problem or a Business problem. So Power BI (used properly) can help us achieve a higher rate of iteration and collaboration between IT and the Business as a result of putting Powerful tools in the hands of more people.

I love architecture as much as the next person, and advocate for it with our clients when appropriate. I’d personally prefer the investment in architecture be matched by a discovered value from analytics, rather than a hoped value from analytics. If we have to build the EDW before we can figure out what we should do with our data, we’ll never build the right EDW. Getting the business directly involved and equipped is how we overcome those hurdles. When the bell tolls, effectiveness with Business Intelligence is measured by the quality of our actions and decisions, not the quality of our databases and dashboards.

Justin

May 29, 2019 at 1:34 pm

Not having Azure Active Directory myself, I can’t comment on the security response. I’d like to see some real world scaled implementations (with TCO factored in) to understand how this grows up. 200 million rows is a good start. I plead ignorance to most of your observations, but I will of course push into it. Again, PowerBI is a great tool, I just hate it when the hype tends to get people over their skis.

I’m still strongly biased towards an architecture that that separates (or can separate) the modeling from the visualizations. The current strategy is to hype self-service BI, and few laypeople have the patience or skillsets. (I’m not protecting turf, just trying to make sure the job get’s done right and quickly.)

Measured against SQL, PowerBI and it’s modelling is a no brainer. Measured against an OLAP layer, requires more careful thought. I guess, the real message is your data needs a real OLAP model. If PowerBI is the modelling tool at hand, then of course, it’s the tool to use.

I love the collaboration “potential” and putting more powerful tools in the hands of more people, but that alone won’t solve the problem.

May 30, 2019 at 12:38 pm

This hits the nub of a source of some confusion for me… I keep seeing advertisements for Power BI develop roles which stipulate that the candidate MUST be able to run ETL process through SSIS (Something i know nothing about but have a vague sense relates to the above)

As far as i understand.. The Power Query component of Power BI is in itself an ETL tool so the ability to run ETL through SSIS is surely completely redundant?

Is the correct response to such job ads “WTF are you talking about SSIS for, when Power BI can do that stuff so much better?” or am i misunderstanderizing something?

May 30, 2019 at 3:32 pm

YES YES YES! You are absolutely seeing a symptom of what I’m talking about.

To many people, Power BI is just “the new SSRS.” The latter relied on SQL, and everyone just assumes PBI is the same thing.

And if you think PBI=SSRS, then OF COURSE most of the work is at the SQL layer, and the “heaviest” version of that is SSIS.

As things stand today, I would absolutely NOT apply for any PBI job with the word “Developer” in the job title, because that organization is tipping their hand in terms of how they view PBI.

We ARE developers, of course. But it just so happens that TODAY, anyone who refers to us AS developers is 99.9% likely to be in the “missing the entire point of all of this” bucket.

Lastly, there are some big advantages of using SSIS over Power Query, even in a world where Data Flows is starting to chip away at those advantages. I have nothing against SSIS. What I have a problem with is using the storage layer as the brain. And the jobs you are seeing are definitely still stuck in that mentality.

Awesome comment/question, thanks for taking the time to submit it 🙂

May 31, 2019 at 10:21 am

Au Cointraire! Thanks for replying to ME and for writing such an awesome and fun blog to begin with. I’m really just get involved with Power BI with an eye to doing some contract work in the area so to have my bullshit sniffer validated in this way gives me some confidence its slowly all sinking in 😉 – Keep up the good work

May 30, 2019 at 2:37 pm

SO GOOD. Thanks Rob for continuing to pump out inspiration.

May 30, 2019 at 3:27 pm

From my perspective, there is 2 points in your post.

First, the need of an OLAP tool. We can all agree that having an aggregation engine is great and that SQL is not good at it. The PowerBI engine (PowerPivot part) is awesome for that. DAX is not my favorite but not especially worse than SQL. I remember preferring MDX. But obviously, there is not many players in the OLAP field, which is why, IMO, no one cares about the brain. The reason is that Microsoft was so good with Excel + SSAS thanks to the deep integration between those products.

The second part is about agility. EDW can be slow to evolve but that isn’t always the case (well, always it’s an IT project). Obviously, it’s not the good place to experiment. PowerBI is nice to complement and existing (and still evolving EDW) by defining new metrics and/or merging new data sources. I don’t buy the idea to start from raw data sources it would be too painful and slow (to deal with the data mess).

Just from a real life example, what if your product id is different in the website and the e-commerce system? Will you make a bridge table just to find out that you haven’t the same than another analyst? Maintain it yourself forever? Would you not prefer a company wide table maintained by someone else (you would still add your own bridge sometimes).

Another real example, as an analyst, would you like to use a data source where employees (to be removed from computations) are to be identified by a column smartly named “dummy_char_1”?

The perfect mix is when you can do whatever you want (and PowerBI bringing you that) but relying on strong foundation for 90% of the works (that’s the EDW part). By the way, the PowerBI work by analysts is a good indication of what the EDW should prioritize (because there is demand).

Regarding speed, your PowerBI example works well because, while there is a lot of data underneath, you are only making serious works on 2000 rows (number of days * numbers of ads types). You are aggregating fact tables then dealing with them. That isn’t much work (but still a wonderful job by PowerBI).

Now, let’s say the transactions table doesn’t have the ads link but a session id. This would be insanely more complex. Let’s say you want to attribute any further non-ad sale to an ad (but only if it was its first purchase and first visits). Even harder. Not really hard to code in SQL (slower, but who cares if it takes 5 minutes in the middle of the night to update the EDW).

At some point you need serious data shaping, and I’m quite sure M is not close to SQL on an analytical database. Having basic knowledge of DAX and good knowledge of SQL beats good DAX and good M knowledge.

I still agree on your 8 year old post : https://powerpivotpro.com/2011/11/why-powerpivot-is-better-fed-from-a-database-pt2/ but maybe I’m a slow to evolve 🙂

May 31, 2019 at 1:11 pm

So much to unpack! And agree with 🙂

Yep, OLAP is the key. But most folks reading this have never heard of it (as you’ve pointed out). And come on, DAX is freaking amazing! If you preferred MDX, you are a member of a super-elite cadre – those who actually “mastered” MDX. Internationally I estimate this population at 1,000 people, tops. I never could even *learn* MDX, much less master it, and I was in charge of that Excel-SSAS integration you so appreciate.

Raw data sources are a fine thing to start with in my book, but it depends on exactly how “raw.” Deciphering underlying ERP schemas for instance is a bit TOO raw for my taste. Starting with a bunch of exported CSV files, on the other hand, is FAR preferable to waiting on the EDW to catch up. Let the EDW catch up AFTER we’ve already been delivering business improvement. “Catch up” is the right phrase.

Your point about only manipulating 2k rows (days and ad types) is factually correct under the hood, but that’s due to the “magic” of Power BI’s brain! It turns the real-time aggregation of millions of rows into child’s play. The point I was making here is that SQL (and other storage layers) don’t get to “cheat” – they have to deal with all the individual rows, and choke to death in the process.

I still prefer good clean databases to ANY other data source, including PQ/M. It’s just that when we DON’T have the good clean db, we don’t wait. Ideally the db work can proceed in parallel with our pragmatic “need results now” approach. But it’s very rare that it makes sense for the business problem to WAIT on the db work.

June 1, 2019 at 9:58 am

Follow the Value stream vs the Salary Stream (Revenues vs Costs) – always has always will.

June 4, 2019 at 3:33 pm

I’m bookmarking this article, as it articulates the problem very nicely when I’m beating my head against CHQ trying to get more data access. We, unfortunately, are mostly at the “run a SQL report, export to Excel, repeat for several different reports, then join those reports together in PBI” stage. I got temporary access to a couple of databases to try and make the point that this was practically criminal negligence, wrote out a report live, on the call, that handled one of these things with a refresh time in seconds instead of days, and they went “huh, cool” and took my access away.

Here’s to hoping they’ll listen to you better than me!

June 15, 2019 at 10:46 am

Thank Rob, wonderful post, super inspiring. I am facing the same situation that you mentioned in the post and keep fighting back to prove the power of data model inside Power BI which Tableau will never ever have a chance to win on this.

Siloed and Slow: What Happens When You Aren’t Using Power BI’s Data Model “Brain”

Resuming from Last Week

Let’s “tease” point #3 real quick…

1) Storage brains are siloed brains.

“Fine, just unify the databases into one!”

2) Storage brains are Slow – in two ways.

SQL Queries are Slow to Write

3) Power BI is NOT a Visualization Tool, and it’s NOT the New SSRS

So… Don’t Do it “Like That”

Microsoft’s platform is the world’s most fluid & powerful data toolset. Get the most out of it.

Cancel reply