zoeZoe Helps Me Answer a Long-Simmering Question

This summer we welcomed Zoe Stein (an Industrial Engineering major from Georgia Tech) to the team for a summer internship.  Which is super exciting just in general – Data wasn’t really “a thing” when I was in school, and to see Engineering majors becoming interested in what we do is very encouraging/validating.

So, what exactly are universities TEACHING, when it comes to data?  Well we have plenty of anecdotal evidence that Power BI *is* being taught at universities, by way of them using our book as a textbook.  But overall, universities tend to shy away from teaching tools that are specific to a particular vendor, instead preferring programming languages.

And these days, the most popular such language is something called R.  I’ve long been curious to get a handle on how R compares to DAX and M, and since Zoe comes from an R background at Georgia Tech, I was very curious to find out what she thought of DAX and M – after a summer of working with those tools at PowerPivotPro.

 

 

The “Exit Interview” on R vs. DAX and M…

ROB:  Where and how were you introduced to R?

ZOE:  I study Industrial Engineering at Georgia Tech, and I was first introduced to R in my introduction to statistical methods class. We didn’t do a whole lot of R in that class besides hypothesis testing, but my professor warned us that we would be diving into it in our upper level classes. Man was he right. We used R for nearly everything in my quality control class this past spring semester.

ROB:  What kinds of projects have you implemented in R?

ZOE:  Like I mentioned previously, we did EVERYTHING in R in my quality control class – homework, mini in-class assignments, and our huge semester-long project. R and I got to know each other pretty well.

I have dug deep into R in 3 specific areas:  Hypothesis testing, Regression analysis, and Control charts.

ROB:  Can you give us an example of each of those three?

ZOE:  Sure, here you go…

  1. Hypothesis testing: Testing whether there is a difference between two types of sealant (Varnish and Lacquer) for 4 different types of wood, and deciding which treatment and material should to choose in order to maximize the expected value of the durability.
  2. Regression analysis: My group’s semester project was designing an experiment to determine the ideal factors for yielding the most popped kernels when cooking stovetop popcorn. We ran a 25-1 fractional factorial experiment – which required cooking 32 batches of popcorn – then ran the analysis in R, finding that the oil type, pot size, and temperature cooked were significant in number of kernels yielded.
  3. Control Charts: Given observations on the oxide thickness of individual silicon wafers, we set up a control chart on oxide thickness and a moving range chart, then determined if the process exhibited statistical control.

ROB:  OK, I understand what you mean with those first two examples, and in fact they seem related.  But ‘Control Charts’ – I have no idea what those are, can you explain?

ZOE:  Control charts are really “just” line charts.  They graph a series of data points to see whether the variations in the data are “natural” randomness or if some other factors are sneaking into your system and throwing things off.

The only thing that makes control charts special is that you calculate the average value and the standard deviation across the entire data set, and put those on the chart as horizontal lines.  You then look for individual values, or sequences of values, that are “fishy.”

control chart sample

Example of a Control Chart
(Courtesy of American Society for Quality)

ROB:  Ooh Neat!  Outlier detection, kinda like Movers and Fakers but more rigorous!  I can TOTALLY do that in DAX!  Challenge Accepted!

ZOE:  Um, did I issue a challenge?  But I’m very interested in what you can put together.

ROB:  (One hour later).  “BOOM!  Control Chart in DAX!”

DAX Control Chart

This Took Me (Rob) About an Hour – and I’m Fascinated. I Want More of This.

ZOE:  Woah cool!!! How difficult was it to produce that?

ROB:  Honestly the formatting took longer than anything.  I don’t think the native line charts in Power BI could do this yet, given all of the tweaks I had to make to the chart in Excel.  But a Power BI custom visual built explicitly for Control Charts would be The Bomb for this – would save all that formatting time in the Excel chart.  I’ll do a followup post on this – cuz I’m hooked, and I need more time to finish it.  It’s not sliceable yet for instance, and if it’s not sliceable, it’s not finished.

OK, back to the questions… Coming from that R background, what were your first impressions of DAX and M?

ZOE:  I found DAX and M to be different at first. I hadn’t written much code with either before. However, there were some similarities to other languages, which made it easier to pick up.

ROB:  “How did those impressions change over time?”

ZOE:  There was a learning curve, but I like DAX and M! It only took reading a few PowerPivotPro blog posts and Google searches before I started picking up even further on the syntax and being able to write it much quicker. I’m a big fan now 🙂 I’m excited to continue working in them and get to the point where I reference the internet less.

ROB:  “How would you now contrast your experience level with Power BI (DAX and M) versus R?”

ZOE:  I would say relatively same skill level for both, but in different contexts, since the projects I have done in Power BI and the projects I have done in R differ quite a bit. The big project I did in Power BI this summer was largely focused on determining key metrics and creating an easily drillable report to observe trends and outliers.

The work I’ve done in R has all been highly analytical – half of the time I’m only looking for a single number or very straightforward answer – “what are the statistically significant factors according to a regression model?” I feel confident in my R skills when running models in R with a static dataset, but when it comes to creating rich visuals and dynamic datasets, I will produce a mean Power BI report every time!

ROB:  Very interesting…  I think that is a crucial distinction.  R yields “small,” specific, and statistically-rigorous answers from static data sets, whereas DAX builds a framework in which we can answer MANY different questions rapidly, against varying subsets of the data.  And statistical rigor is very much possible in DAX, but it’s not what we typically do with it.

So along those lines… now, when would you use DAX and/or M vs. R, vs. both in tandem?

Examples of Power BI/DAX Questions vs. R Questions

A Handy Comparison Reference Created in Cooperation with Zoe

ZOE:  I would default to R when performing a one-off statistical test.  For instance, if I were asked “is there a statistically significant difference in the number of comedy movies watched and action movies watched?” – I wouldn’t see the need to utilize PowerBI over running a hypothesis test in R.

However, as SOON as the business problem involves building a report, data refresh, or slicing… Power BI it is – “How do the movie ticket sales broken down by comedy and action genres in 2017 compare to 2016?” I do think there’s a lot of potential in using the two in tandem. A forecasting visual that I previously used in R is extremely helpful in goal setting and would be awesome to include in a report.

There are many R wizards out there building custom visuals, and I feel that soon, we will be able to find almost any R visual we could want to incorporate in a report in the custom visuals library. The opportunity to use custom R script in Power BI could prove to be extremely valuable in cases that I have yet to come across, but in the meantime, the above is how I’d make the decision of which to use.

ROB:  Thanks Zoe!  I’m sure we’ll be continuing this conversation over the coming year, but in the meantime, we all wish you well back at Georgia Tech this Fall.

Rob’s (Tentative) Conclusions

R Produces Statistically-Rigorous But Static Conclusions. DAX Produces Highly Dynamic and Interactive Results - But Isn't Suited to SOME Statistical Tasks

Based on what I’ve learned from Zoe, and what I’ve read to fill in the gaps, I think the two big differentiators are “how much statistical rigor do you require” and “how many different variations of the question do you need to ask?”

R is a procedural programming language, like C or Java.  “Procedural” means you tell the computer, via the programming language, precisely what to do and when to do it.  Sticking with the visual metaphor above, this means “first put this block here, then this other block there.”  It builds precisely what you tell it to build, and nothing else.  If you want a slightly different version of your pyramid, you have to go change the R code.  If you want to see something different – even a slightly different filter, or to break your results out by Year or Product Line, the Author of the R code must do some new work.  (I’m also suspicious that R doesn’t deal well with a multi-table data set.)

DAX, by contrast, is a functional programming language backed by a MONSTROUS pre-built “brain” (the SSAS Tabular engine).  Once you’re done expressing the mathematical constructs you desire, those constructs are INCREDIBLY FLEXIBLE.  Want to see a year over year comparison?  No problem, just check this checkbox.  A DAX model, in other words, can answer MANY different questions – without having to go back and modify the DAX.  This gives us end-user interactivity, and it also makes things faster for the DAX author as well (cuz we don’t have to write different versions of our “code” to address a simple variation.)

So, advantage DAX in that regard.  But I don’t think DAX is the right tool for “tell me which factors are statistically significant,” at least not on its face.  DAX takes different intersections of values (like Temperature=90 and Product=Oil) and calculates mathematical results for those intersections (and many such intersections all at once) but it does NOT tell us, for instance, “the value of the Temperature column has much more impact on the outcome values than the Product column.”  R can tell us which column, or combinations of columns, have the most impact on our outcome, and by how much (ex: Temp is 2.3x as relevant as Product).  DAX takes column names as inputs, and reports outcomes based on values within those columns.  It does not treat column names as outputs, which R absolutely can do.

It’s telling, in other words, that of the three R project examples cited by Zoe, I’m so intrigued by the potential for Control Charts in DAX, but not so enthusiastic about Linear Regressions and statistically-rigorous Hypothesis Testing.  An interactive Control Chart is something we could build in DAX – and relatively easily I think – but damn-near impossible in R.  Stay tuned.

“Hey, what About M?”

Well, M is procedural, just like R, so it carries the same characteristics of “it does one thing, and isn’t sliceable/dynamic like DAX.”  Which is why I’ve made the surprisingly-controversial recommendation that you should learn DAX before getting deep into M.  It’s just like the reasons why DAX trumps SQL for analysis – SQL, like M, isn’t well suited for quickly answering lots of related questions.  DAX is.

But just like R is purpose-built for statistical analysis, M has its own “sweet spot,” which is data transformation and cleaning.  DAX is completely unsuited for that sort of thing, which is why M and DAX are such amazing “partners” in the Power BI ecosystem.  Being built for a single particular kind of task is a big advantage.  Lots of R programmers perform data prep in R today, and I suspect that they’d benefit greatly from instead using M to do the data prep, and then feeding the prepped data into R for the statistical analysis.  Anyone out there doing precisely this?

  Subscribe to PowerPivotPro!
X

Subscribe

Rob Collie

One of the founding engineers behind Power Pivot during his 14-year career at Microsoft, and creator of the world’s first cloud Power Pivot service, Rob is one of the foremost authorities on self-service business intelligence and next-generation spreadsheet technology. 

This Post Has 34 Comments

  1. Nice article!
    I like to add my 3 cents 🙂

    1) R vs M: Although they are both functional languages (and at least in my rookie eyes have lot in common: http://www.thebiccountant.com/2015/11/11/power-query-power-bi-are-ideal-learning-paths-from-excel-to-r/ ), the query editor in which M runs in Excel and Power BI makes M a direct language as well: After you’ve finished your expression, you can see the results immediately. For me this was essential to get my head around M at all, and I believe that this make a huge difference for many folks, who like me are lacking a formal technical education.

    2) R & M always produce one result, while DAX-measures (I think you agree if we just ignore calculated columns here) produce … “elements” with the determination to produce multiple results. So one formula has the potential to show the correct results in exponentially increasing cases/coordinates of your 2-dimensional reports. Super power, super effective! But they need to be evaluated in an “intelligent environment” like pivot-tables or cubefunctions in Excel or the grid in PowerBI (where the context is provided and the compilation takes place).

    3) While R & M can only produce one result, this result can have different characters: It can either be an end-result (like a nice R-chart) or an intermediate data-dump in form of a flat table. People have used and cherished ordinary pivot tables long enough to be tempted to use implicit measures (or quick measures) as the start of their data discovery. So the environment that DAX cannot live without (or at least cannot show its strengths without) also sets free some M-agic for simpler approaches 🙂

    So my mental image adds M as a flat table with an arrow to a pivot-table to your R-pyramid and the DAX-cube here 🙂

      1. Hi Rob,
        yes, I use R-functions if I cannot implement things in M. The cool thing is that you can encapsulate the R-code into a custom M-function where you can present a nice function-dialogue to the user.
        And if we had the R-integration in Excel as well, I would do much more with it, but this holds me back considerably here at the moment.

        1. Imke can you give us an example of when/how you incorporate some R code into an M wrapper? I’m intrigued, and it sounds from some other comments below that some folks would LOVE this, and don’t yet know that it’s possible.

  2. Rob, another good post. I learnt a lot from this and now see a priority for developing my data analytics skills. First “DAX”, close second “M” and third for polish, “R”.

  3. I’ve never looked at Power BI and R as competing against each other. Yes, they have some similarities but their strengths are very different which you pointed out in the article. I use both on an almost daily basis and sometimes together. My favorite so far was building a dynamic ggplot heat map in Power BI with only a few lines of code and it looked great.

    1. In terms of perception, I think R and Power BI kinda *do* compete, because people rarely get to the bottom of the capabilities of a technology until *after* they have chosen to adopt said technology.

      For instance, I’m a DAX man, through and through. Until recently I haven’t even started to *investigate* R – cuz hey, DAX is killing it for me.

      There’s similarly a whole generation of students now coming up through the ranks being taught that “Data = R.” So there’s a perception gap on both sides.

      Now that I’m getting into it a bit, I’m seeing that R isn’t a replacement for what we do in DAX – I’m positive that if I had to choose one over the other, I’d choose DAX. But since we *can* use both, I want to make sure we are leveraging them both properly. (And people on staff here at PowerPivotPro are already quite well-versed in R – it’s just ME, the “Old Man,” who’s catching up on the strategic front).

      Lastly, I always use the word “versus” in these article titles to indicate “compare” – not “compete.” 🙂

    1. Repeating part of my comment above: I always use the word “versus” in these article titles to indicate “compare” – not “compete.” 🙂

  4. DAX (as expressed in Power BI) can leverage data transformations using R Script steps embedded in Power BI Queries (M), and also Power BI’s “R script visuals” can leverage data prepared in Power BI Queries and DAX transformations. So we probably all should learn more of all 3 to use the best combination of tools for the job.

  5. A lot of your readers will probably have seen Chandoo’s recent post on creating a statistical chart using R.

    http://chandoo.org/wp/2017/08/17/visualize-salary-increases-jitter-plot/

    7 lines of R code… This is a tool we need to learn! Based on the variety of chart options in ggplot2 it doesn’t seem an appropriate use of time for the Power BI team to try and duplicate every visual. But on the other hand, and to Rob’s point, can R read a multi-table data model – or result set – to generate such a chart?

    So while Rob’s distinction may be appropriate today, both sets of tools are going to evolve. It may be easier for R to create a tabular library than for PowerBI to develop an R shell, but we won’t care as long as it makes it easy to display the visuals. The “Sweet Spot” features in the above table are (deliberately?) distinct between business and engineering objectives. Manufacturing may thus be the initial point of integration between the two. Time to study up on what sort of data extracts MRP and MRO systems typically offer?

  6. I cannot agree or disagree more. I have used DAX for last 5 years and used R for couple of years. Both have their own strengths and weaknesses. One cannot replace the other. Like Jimmy Glenn mentioned – ‘It’s not a dax vs m vs r game. It’s how can I make sense of this data as quickly as possible.’ M can be good to retrieve data quickly from variety of sources/connections and R is great for visualization (Power BI or Excel don’t come anywhere close, period). An end user will pick his/her option based on how fast he/she can get to the end result. I think on a long run there has to be a synergy between DAX-M-R. For someone who can get this synergy working well will get the quickest results (provided they need the high degree of analytical complexity that R can offer)

    We also have to keep in mind the segment of people who come to R or have picked R – most of them have already made a choice between R and Python (There’s a very small overlapping end user base between the two). So, people who have picked R, have streamlined their aspirations – they like the speed of R, the vast ocean of packages for flexibility and finally visualization. More importantly they do not belong in production environment (some overlap with DAX end users – “quick and dirty”) or they may not be creating scalable enterpise level ML or AI applications.

    I think this niche segment of end users is the common base between DAX-M-R (or MR. DAX) that makes this an exciting zone. A nice venn-diagram is needed (@Rob Collie).

    Thank you for a wonderful post and even more wonderful comments.

    1. Venn Diagram – I like it! Trouble is, I already spend more time on the visuals in each article than I do on the writing – by a lot, actually. I’ll come back to this topic a lot in the future, I suspect. Plenty of chances for new visuals 🙂

  7. Not a comment about the methods, but what a trip down memory lane seeing a control chart. I used to be in QA during the growth of SPC in the Auto Industry in the 80’s, and the amount of CPK calculations and control charts that I used in my job to track and adjust production on the fly or verify new equipment functionality… Just happy to see it still in use.

    1. This is one of THE COOLEST comments I’ve ever read. Thank you for sharing. I’m kinda infatuated with the idea of control charts now actually, there’s gonna be more on this 🙂

  8. I am an enthousiastic Power BI user. A year ago I bought a booklet about R because someone told me that R was “the” free solution for all statistics. Used on universities. I bought a simple book: R for Everyone from Jared P. Lander. My impression was that R looks like MS DOS and Power BI looks like Windows 10.

  9. Hi, Rob, I want to highlight something. I became a big fan/supporter of DAX following your blog and when I read your book I thought “I’m becoming a Data Jedi”, and I did. I achieved more productive than the rest of my team and that encouraged them to learn the language too.

    I did a lot of awesome projects with that acquired knowledge, and then you wrote a post talking about us ExcelPros and Data Scientists (2012). The things you listed we didn’t do, pushed me to pursue that knowledge (via R) and even started a business after acquiring some of them.

    Now I can confirm that in reality R, DAX and M aren’t competing. Sometimes I use R to make a highly specified analysis that was reproducible and other times I found myself making PowerBI reports because the consumer needed to slice and dice data. At the end I think that through development we are going to find a sweet spot where:

    * The PowerBi environment will benefit for some of the high-level statistical functionality found in R, via tidy data output of statistical objects.
    * The R environment will benefit from the logic that can be infused into data through the DAX/M engines, making more easier to slice and dice data, something very important to the EDA stage when producing high-level statistical analysis.

    So, you’re doing well in putting this thought in public discussion and taking an interest in this topic. I think it’s where everything is heading towards.

    1. Wow wow WOW. So, so good. Thrilled to be a part of career and personal growth stories like this. (And let us know if you’d like to collaborate a bit.)

  10. One of the greatest features of R (and Python… which some of us prefer to R!) is the ability to add libraries on the fly.

    DAX is a very controlled language – you don’t see new libraries being added in to enhance the language the way that new libraries are added in. And for specific or scientific calculations, you still need to create your own custom M functions.

    Now… if we could use Python Pandas libraries in M – well then we’d have something!!

    1. Cannot agree more with the comment above !
      We definitely need to be able to use libraries in M.
      Libraries for data prep’ and analysis are like custom visuals for data visualisation. They extand your power without any additional effort.

    2. On its face I don’t see much need for custom libraries in DAX. I mean, I’d love some date intelligence functions that deal with custom calendars I suppose. But can you give me some other examples of custom DAX libraries you’d like to see? (I’m genuinely curious rather than trying to “call your bluff” or otherwise dismiss you.)

      1. Excel has a pretty robust set of financial functions; that would be somewhere to start from my perspective.
        Trying to do something like a depreciation calculator requires an in depth understanding of ListGenerate in M in order to determine what the value of an asset is at a particular point in time. Totally do-able, but definitely an impediment for your casual PowerBI user.

  11. Great article and lots of very insightful comments. I will say that R does handle multi-table datasets easily, using one of many R packages like dplyr.
    To me, both DAX and R are similar in that they can quite often give the “Wow!” factor to the analyses they can do. They are different, but very complementary when diving into data.

  12. Well, I have to say, this is one of the most exciting posts I have seen on PPPro. I have been converting all our reports from a previous reporting tool to PowerPivot and came across this stuff about control charts through a reference given to me by the illustrious Fred Kaffenberger (yes, your Fred Kaffenberger). Fred steered me towards a video from the 2016 Microsoft Data Summit which featured a PowerPivot implementation for Tyco. In that video was mentioned a lady by the name of Stacey Barr. I looked her up and whoa, talk about Control Charts! Stacey is part of a very modern movement which involves the use of these charts (which she likes to call Smart Charts). These kinds of charts have been used in manufacturing for decades as part of Quality Control. But people like Stacy discovered that they can be used for all sorts of business processes to measure performance and they have now become part of the Evidenced-based Management Movement. They are referred to as Process Behavior Charts.

    Well, dilemma for me. Should I continue with use of PowerPivot or get involved with Process Behavior Charts as our primary reporting mechanism (I do a lot of the corporate reporting for our company).. In fact, I didn’t think we would be ceasing our use of PowerPivot, but I was curious as to whether or not I could use PowerPivot to produce Process Behavior Charts. As I was much of a DAX newbie as I was with respect to using these charts I asked Fred K to assist with creating a PowerPivot report for a Process Behavior Chart.

    So, very cool to see this post!

    If you are really interested in learning about modern uses of Process Behavior Charts in business, I’d strongly recommend reading Donald Wheeler’s book Making Sense of Data, as well as: Understanding Variation: the key to managing chaos. Stacey Bar’s latest book Prove It! is also great.

  13. Rob,

    First of all – great post!

    I think a natural way of extending the topic is to do a comparation between Python and M. As you have mentioned, M is very powerful in terms of tranforming the dataset, while fit well with the power of Python with its data wrangling library like pandas etc.

    Personally I have been using M for 2 years building corporate level dashboard, and recently get into Python and found it fascinating to performing certain task, particularly its massive library.

    Tom

  14. Excellent post agreed. Control Charts are a big deal to those of us that are in the process improvement (PI) space, Lean Six Sigma (LSS) work specifically. I have been using control charts for well over a decade and the idea of finding them thru DAX would be most excellent since now I use a wonderful tool called QI Macros, but I have to go out of PP and into the excel data so I have learned a bit on using PQ to get good data tables for both PQ and my PI work. I wish I was better at PQ and PP and am working on it.

    So you will find a rather large space available for new recruits into the PP and PBI world if control charts and then some other LSS tools can be easily developed with DAX. As an aside I believe Tableau has R integrated into it maybe tree or so years ago?

    Carl
    Lean Six Sigma Black Belt
    PP, PQ,PBI Padawan

  15. Epic post! ROB thank you so much for the time and energy put on these posts! I just wanted to say that i think R and Power Pivot/ DAX serve different analysis purposes. For instance, you cant do machine learning algorithms with DAX while you can do machine learning processes with R. Also, i don’t agree when you say, somewhere in the post that R is used more to analyze static data. In R, you can perfectly set a predefined algorithm to forecast time series data (yeah alright i know, you can do MA’s algorithms with DAX but you cant do ARIMA or SARIMA algorithms with it witch are more used in real world cases, imo) and run that algorithm in a scheduled windows task. You also have R server now, for SQL server and make R to analyze dynamic data from a SQL database for instance. 🙂

  16. the way i see it is , R can be used as a stop gate until a particular functionality is implemented in PowerBI, there is no reason M and DAX can not evolve to solved all the problems solved now by R

Leave a Comment or Question