**Out of 407 Total Combinations of Subcategory and Region, These 8 Stand Out This Month**

*(These 8 Combinations Differed GREATLY from their Respective 12-month Averages)*

# A Post Long in the Thinking: Formulas that “Scan”

*(What’s this, you say? An actual “how-to” post from Rob? Well YEAH, it was about time!)*

**DAX is fantastic at calculating numbers for us, and producing metrics that drop jaws all over the world.** It is, quite simply, The Best Thing Ever. You want an all-up metric that properly accounts for seasonality and price changes simultaneously across all divisions of your organization? That thing you’ve always talked about like it was The Holy grail? Well, it’s really no problem.

**But I’ve now had a long obsession with making DAX do more than that for us.** I want it to not just

*produce*the metrics, but also to then

*robo-scan*said metrics on our behalf and report back on interesting things that it found. I don’t necessarily get too many chance to indulge this desire, and that’s a shame, so today I’m gonna use the old reliable AdventureWorks fake data.

# Not Just “Off the *Average*,” But “Well Outside *Normal* *Variation*”

**The basic idea here is “alert me if something has changed dramatically.”** If there’s a corner of my business that has spiked or crashed in a big way, I want to know. If something has dramatically improved in a particular region, I may want to dive into that and see if it’s something we can replicate elsewhere. And if something has fallen off a cliff, well, I need to know that for obvious reasons too. And both kinds of dramatic change, positive and negative, can easily be obscured by overall aggregate values (so in some sense this is a similar theme to “Sara Problem?”)

**So the first inclination is to evaluate distance from average performance.** And maybe that would be fine with high-volume situations, but when we’re subdividing our business into hundreds or perhaps thousands of micro-segments, we end up looking at smaller and smaller sample sizes, and “normal” variation can be legitimately more random than we expect.

Consider these two series of numbers:

## 12, 2, 12, 2, 12, 2, 15

## 8, 6, 8, 6, 8, 6, 15

**Both series have an average of 7 over the first six data points, and both end with 15 as the seventh data point.** If we just go by the average, that final value (15) is just as “much” of an outlier in the first series as the second: in both cases, our most recent number is 8 units higher than our average.

**But our eyes can tell that 15 is far more “outlandish” in the second series than the first, right?** We were just cruising along, hyper-consistent around its average, and suddenly, BAM, huge spike to 15. I can definitely imagine wanting to be notified of the second series’ spike and NOT caring about the first series’ spike.

**So we bring Standard Deviation into the picture – not to replace “distance from average,” but to complement it.** Series one has a standard deviation of 5 and series 2 has a standard deviation of 1. In some sense our eyes are “seeing” the standard deviation difference between these two series of numbers – series one can’t make up its mind, is super hot and cold, whereas series 2 is almost a flat line. Standard deviation is, in other words, a measure of how “choppy” a data set is, or what is considered “normal variation.”

**Now, bringing Standard Deviation and “Distance from Average” together…** if we compare distance from average to standard deviation, we get a sense if this most recent change is a *bigger than normal change!*

In series one, 15 differs from the average by

, or a3 units morethan its standard deviation60% spikeoutside of normal variation: (15 – 7 – 5) / 5 = 60%In series two, 15 differs from the average by

7, which is actually anunits morethan its standard deviation7,000% spikeoutside of normal variation: (15 – 7 – 1) / 1 = 7 = 7,000%

**The sensitivity thresholds for each biz are going to be different.** Hey, 60% may be plenty to catch your eye (probably not though). But 7,000% is absurdly out there and definitely worth hearing about.

# Let’s Write Some (Measure) Formulas!

[Most recent 12 months sales avg] :=

CALCULATE([Total Sales],

FILTER(ALL(Calendar),

Calendar[MonthID]<=MAX(Calendar[MonthID]) &&

Calendar[MonthID]>=MAX(Calendar[MonthID])-11

)

) / 12

**I’m choosing to use the “greatest formula in the world” flavor of calendar navigation** here rather than the built-in functions like DATEADD because we can generally make them work in ALL biz scenarios, but also because in this case I’m just more comfortable being “hands-on” with the logic rather than trusting the magic functions to do the right thing.

This relies on a MonthID column that in my Calendar table:

**Subtract the First Year in Your Data Set (2001 in my case) from CalendarYear, Then Multiply by 12 and add MonthNumberOfYear – This Yields a MonthID that You Can Navigate with Arithmetic**

**Moving on** to more formulas…

[Most recent month sales] :=

CALCULATE([Total Sales],

FILTER(ALL(Calendar), Calendar[MonthID]=MAX(Calendar[MonthID]))

)

[Absolute Delta from Average] :=

ABS([Most recent month sales]-[Most recent 12 months sales avg])

**I’m taking the Absolute Value here** because I want to treat “up” and “down” spike equivalently, AND because STDEV never returns negative numbers.

[STDEV Sales over Last 12 Months] :=

STDEVX.P(FILTER(ALL(Calendar),

Calendar[MonthID]<=MAX(Calendar[MonthID]) &&

Calendar[MonthID]>=MAX(Calendar[MonthID])-11

),

[Total Sales]
)

**And finally, the “detector”** measure:

[Am I Big Change] :=

IF([Abs Delta From Average] >

(20*[STDEV Sales Over Last 12 Months]),

1,

BLANK()

)

**I’ve highlighted that 20 because that’s nuts – YOU WON’T USE 20 THERE!** That 20 means I’m saying “let me know when the latest number is TWO THOUSAND PERCENT the normal standard deviation, whereas earlier in this post, I mentioned that 300% is a lot.

**Seriously, you need to tune your formula to a sensitivity** that fits your situation. And you do that by changing that 20 to other values. Start with 3, because that theoretically means you’ll catch only the most-outlying 0.3% .

**Well remember, I’m working with fake data here, and I had to crank the factor up to 20** before I got “small” results (like the 8 combinations of Subcategory and Region displayed in the pivot at the top of the post). Hopefully, you can use something more reasonable, like 3, but play around with it and find out. (You could also use a disconnected slicer and make it a variable that the user can control of course.)

**Furthermore, my fake data was loaded with lots of silly cases** where Subcat/Region pairs exhibited ZERO standard deviation, which is just not going to happen in real life. So I filtered out any cases where that happened, too – just to make the results look more realistic.

**Because My Fake Data Was Loaded with LOTS of Zero-Deviation Examples, I Added the Highlighted Test
**

*(You Probably/Hopefully Don’t Need That Part)*

# And Now… the Pivot Itself

First I created a Flattened Pivot…

Then added my Subcategory and Territory fields, plus the measures:

But I don’t want to see Subtotals here, so I turn those off…

And then I filter the “inner” field (Region in this case) to only show me [Am I Big Change] = 1…

And then, because my Calendar table “overruns” my Sales table (it contains dates that “happen” after my last Sales row of data), I add a HasData report filter:

Which itself is a calculated column in my Calendar table:

That prevents me from EVER using a “bad” month (one that contains no data) as my most recent month.

# Portable!

First off, we can add a Month-Year Slicer and go look at outliers in past months as well:

**We are NOT Locked into the Strictly Most-Recent Month (July 2004) –
We Can Slice Back and Look at Other Months
**

*(And we see there are more outlier combinations in May than in April)*

Furthermore this is NOT “locked in” to Subcategories and Regions specifically. We can try out other combinations on Rows of the pivot, as long as we remember to re-set that [Am I Big Change]=1 filter on the innermost field…

**Customer Education Level and Subcategory Work Too – No Formula Changes Required
**

*(Although you might need to tune the sensitivity differently)*

# Let Us Know If You Use This (or Not)!

**We’d be very interested in real-world applications of this technique,** because such things simply make us happy when we hear about them. So please circle back here if you have success… but also if you don’t.

**Also, any ideas how this can be improved? Flaws in my thinking?** Throw those at us too.

great to see some statistical analysis in DAX.

BTW in the quotes just before you document the 1st measure the 2nd quote reads series 1 it should read series 2.

Ah good catch thanks! Fixed. And thanks for the kind words too 🙂

Agree with David – really nice to see stats in DAX.

As DAX is so powerful I’d think that you could also create some very insightful analysis on how your data structure changes over time (like development of standard-deviation or correlations over time, Pareto-Analysis on outliers over time). Graphical display of the results would deliver interesting insights to Forensics and Marketing for example.

I was pretty amazed to see how fast Monte-Carlos-simulations ran in DAX for example: https://blog.oraylis.de/2016/05/monte-carlo-simulation-with-dax-and-power-query/

Very interesting, thanks a lot.

Can you comment also on the speed of the calculations? Is this method scalable over large databases? E.g. what if we wer trying to find out outlier customers over millions of customers?

Well it’s gonna break down somewhere. This example only “scans” 407 possible combinations, so as you scale up, it can quickly explode (imagine 1 million customers times 5,000 products).

Furthermore, this technique relies on FILTER and STDEVX.P, both of which are manual iterators that “crawl through your data on their hands and knees.” (These functions are not able to utilize compression as a speed advantage, unlike simple CALCULATE filters, and furthermore *also* fail to utilize more than one core of your CPU, for a double-whammy of slower.) Our iterations over the Calendar table could probably be sped up quite a bit, and forced to iterate over 40-ish rows (of Months) rather than the < 1,000 rows (of dates) it currently is scanning. So if we hit a speed snag that would be the first place we'd look. That said, most cases in which I'd want to leverage this would revolve around segments of the "population" rather than individuals. Even the largest of businesses can often be segmented in useful ways, and variations at the individual level (whether that be specific customers, specific products, whatever) tend to naturally be noisier anyway - it takes some aggregate-level change in your business for a segment (or combination of segments) to spike or drop, but an individual can flip much easier. Anyway, give it a shot and see if it works for your needs. If too slow, like I said, I think the FILTER can probably be sped up by a factor of 100 or so.

Interesting and useful approach ! Thanks

While you show the ABS Delta and 12-Month Standard Deviation values, I’d probably enhance that to show what Pct Std Dev = DIVIDE([Abs Delta from Average], [STDEV Sales over Last 12 Months] ) that equates to. It just adds another level at which to focus one’s attention to “the Most Outside Normal” on that percentage basis, since each line item has its own “normal”.

Ah yes, that WOULD be a good addition for sure. It won’t change the list of combinations being displayed of course, but it WILL give the consumer a better sense of WHY it got displayed. (Which is what you were saying Chris, I’m just making sure that others who read this exchange understand as well).

I did the same – adding a Pct Std Dev measure – but then went a step further and calculated an “Outlier Factor Measure” which I think is a bit simpler for business use…

“Outlier Factor Measure” calc is Delta from last 11 months divided by Standard Dev for last 11 months. This calc puts out “Outlier Factor Measure” as a tangible number which you can rank the Pivot Table with. Setting it up as a whole number makes it easily readable, but keeps the decimal in the actual cell of the pivot itself. Handy for a dashboard!

Also, you don’t need to have the “Am I a big change” measure as you can just filter the pivot table on Values for “Outlier Factor Measure” > 0.

But, if you want to filter out certain Outlier Factors using the disconnected slicer method (as mentioned in post), you do need to have the “Am I a Big Change” Measure, else you can’t filter out outliers < 5 (for example) without faffing around within the Pivottable.

Love this extension/modification too, thanks for sharing 🙂

Great post Rob. I love practical use stuff like this.

Holy crap! I was JUST trying to figure out how to do a very minimal version of this and it was driving me crazy. I don’t entirely know how this happened, but in the five years I’ve been using Power Pivot I have NEVER run across the “Value Filters” option! Great timing, and awesome how-to post!

We aim to please around here Andy 🙂

I don’t know about everyone else, but I’m just impressed the post managed to exclude the terms coefficient of variation and/or relative standard deviation. and I’m not saying I wrote this comment here so that google will find this post when somebody queries for those terms, but… 🙂

*Chuckling*

This is the kind of stuff that I love reading on this blog! Thanks Rob!

Rob,

Thanks for pushing the envelope on DAX and taking the time to make the formulas available. The measures are a great solution to a real puzzle!

Awesome post! Very excited to see some more how-to info that includes statistics and can easily be applied to real world situations. Nice!

Hi Rob, I was wondering, how would you set this up for, example, to see a change between a period of two weeks rather than just by month? I usually have to interpret data on results based every two weeks.

Best,

Matthew

More than 2 years later, I’m back to answer 🙂

You just need to establish a “PeriodID” column or similar in your Calendar table, rather than the MonthID column I use in this example.

So you’d need a calc column along the lines of “divide week number of year by 2 and round down,” but accounting for year is going to be harder since weeks tend to “straddle” year-end boundaries (a problem we don’t have with months).

So we’d be tempted to subtract min year from year, then multiply by 26 (you know, 52 weeks a year divided by 2) and add that to the calculation above. But that will mess up on year-end situations.

I suddenly found myself obsessed with this problem, so here’s what I just cooked up for a calc column:

=RANKX(FILTER(ALL(‘Calendar’),’Calendar'[DayOfWeek]=7),’Calendar'[Date])

That formula will give us a single-week ID that doesn’t mess up on year boundaries. But you want 2-week periods, not 1-week periods.

So we add a “divide by 2 and then round down” step…

=TRUNC((RANKX(FILTER(ALL(‘Calendar’),’Calendar'[DayOfWeek]=7),’Calendar'[Date])+1)/2)

Notice that there’s a “+1” in the numerator before we divide by 2. Otherwise the very first week gets stamped with a zero value by itself, which is not what we want.

Hi Rob,

SHouldnt we in calculation of average and stanadr deviation omit the max(month), couse it influence the last 12 M period calculation. Like in the example above- to calculate deviation 0 from 7777777 you dont take into account 15

I have this working at the macro level, but when I select a filter (in the column header) these measure are not filtered. Any thoughts?

Awsome post by the way! But I have some problems with new products that hasn´t been sold for 12 months so the calculated measure “most recent 12 months sales average” doesn´t quite do the trick. Any suggestions?

In the beginning of the post where we read “In series one, 15 differs from the average by 2 units more” shouldn’t it be 3 units?

Yup. Fixed now. So funny that I had the 60% correct (3 divided by 5) even though I had 2 at the beginning of the sentence. I probably finished up writing this post in the wee hours of the morning or something… I hope 🙂

Rob, the very first measure you created was:

[Most recent 12 months sales avg] :=

CALCULATE([Total Sales],

FILTER(ALL(Calendar),

Calendar[MonthID]=MAX(Calendar[MonthID])-11

)

) / 12

The measure assumes there are 12 full months of data to average. This might not be the case, say, in a retail environment with a new store, a wholesale application with a new customer, or a wholesale application with a customer that purchases sporadically.

Do you have any suggestions in the above measure to replace “/ 12” with a measure that determines the actual number of months with data within the past 12 months?

Cheers!

Ron

“In series one, 15 differs from the average by 2 units more than its standard deviation, or a 60% spike outside of normal variation: (15 – 7 – 5) / 5 = 60%

In series two, 15 differs from the average by 8 units more than its standard deviation, or a 160% spike outside of normal variation: (15 – 7 – 0) / 5 = 160%”

Quick question: Why is each series divided by 5?

n-1 ; The number of periods used to cal STDV -1

Mitchell was giving me WAY too much credit 🙂

I was *intending* to be dividing by the standard deviation. But Series 2 had an SD of 0, not 5! I was dividing both series’ “distance from normal deviation” by the SD of Series1, which was simply a dumb mistake.

I suppose I unconsciously didn’t want my very first example to have a div by zero error in it? Who knows.

So I’ve changed the two series now. The second one now has an SD of 1. And the example now does appropriate arithmetic (I hope!) resulting in 7,000% rather than 160%

In my experience, when using absolute delta from average some extreme outliers can make regular values get flagged as well (if you have a value so large that it shifts the average away from most data points, making most data points seem “way below average”).

Using absolute delta from median might come in handy in those situations.

Ooh, that’s an interesting angle I had not considered. Next time I do this with real data…

Curious: how long have you been doing this kind of “delta from expected” analysis, Franz?

Hey Rob,

Been reading your website for the few last weeks and loving it.

I’d like to know how could i adapt this so i can have a factor to each SKU (presuming that i have a table with two columns: SKU and factor).

Thanks a lot, and sorry for my english (i’m from Brazil).

Hi Rob, there is a way to address some missing data? In my real data I don’t have data for some months and this affect standard deviation …

thanks in advance

HI,

using your post I create a table to intercept outliers in sales order by month but now I’m facing with some product that doesn’t have sales in some months and so I found that standard deviation is different from the correct values due it dosn’t consider the blank value. do you have some suggestion to show the correct standard deviation?

Thanks

To treat the blanks as zeroes, try adding a + 0 in the numerical portion of the STDEV measure?

[STDEV Sales over Last 12 Months] :=

STDEVX.P(FILTER(ALL(Calendar),

Calendar[MonthID]< =MAX(Calendar[MonthID]) && Calendar[MonthID]>=MAX(Calendar[MonthID])-11

),

[Total Sales] + 0 )

Does it take courage or foolhardiness – after so many great brains have commented – to ask how you conclude that 15-7-5=2? …I get 3 fairly consistently TODAY each time I stare at it. 🙂

And “thanks” to MITCHELL for observing that the divisor is n-1. I had sloppily thought the divisor was the SD because in the first series they are both 5 and my little brain was attracted to the idea of expressing the current period variance from the 6 prior period average results as a multiple of SD’s… kind of a cousin to the CV and/or a trick the six sigma (empirical rule) guys like?

But fogginess again raises it’s ugly head for me when SCOTTSEN tosses in CV (& RSD which the google tells me are synonyms) because these are SD/Avg – which i have used before – BUT Rob is dividing by n-1. I can see that both the coefficient of variation and ROBefficient of variation indicate variation levels — no — i get it: the CV is an indicator of relative variation in the population – the ROBefficient is an expression of the current period to the historical level of variation. ouch. Was that good for you too?

Your hilarious comment brought me back to this post. As added above, it’s fixed now. So funny that I had the 60% correct (3 divided by 5) even though I had 2 at the beginning of the sentence. I probably finished up writing this post in the wee hours of the morning or something… I hope 🙂

Years later even I don’t recall why I divided by 5, so now I need to go make sure that it was n-1 in MY brain… 🙂

Well it turns out… nope. Mitchell was just giving me too much credit. I’ve corrected the examples now, and as commented above, here are the changes:

I was *intending* to be dividing by the standard deviation. But Series 2 had an SD of 0, not 5! I was dividing both series’ “distance from normal deviation” by the SD of Series1, which was simply a dumb mistake.

I suppose I unconsciously didn’t want my very first example to have a div by zero error in it? Who knows.

So I’ve changed the two series now. The second one now has an SD of 1. And the example now does appropriate arithmetic (I hope!) resulting in 7,000% rather than 160%

You asked for real world examples. When I had a problem come up I vaguely remembered this post, and I was able to search the article and use the measures to highlight variances.

Our electronic gateway company handles our EDI transactions. For example, we’ll send an 850 PO and expect 856 ASN and 810 Invoices. However, the gateway is less than perfect shall we say. A couple of months or so ago we added a location to our system to receive 810s from a particular supplier. Our gateway provider flipped the settings for some reason – the new location began receiving 810s but the old locations STOPPED getting them. For 2 months we weren’t getting invoices and our supplier was going bonkers! The gateway has good reporting on what they do send, but of course couldn’t tell us what hadn’t been sent, even though they were the ones who blocked it!

So I’m using your formula to validate our past average number of weekly invoices and measure that against our current week’s invoice total by vendors who send 810s, and if the number rises or falls beyond * Std Dev then we need to take a look.

Oh my goodness this is PERFECT. Combine that with a clever use of alerts, perhaps? Whether built-in Power BI service alerts, or failing that, Power Update alerts? (The former is simpler but the latter is a lot more capable).

I recall your alert articles too, but I do have a day job 😉

We have to use Excel since our department doesn’t have expertise in hosting PowerBI on our SharePoint intranet, though that’s on our wishlist. But we play with the numbers so much that we never freeze our reports long enough to build a dashboard.

We only get the ERP extract files with invoices once a week so an alert system would be overkill – we just get the report Monday morning and have our AP team members call the vendor. UNICHAR support in Excel would be nice so that I could have the measure show an up or down arrow instead of a traffic light KPI. It’s near the end of the year so a lot of vendors are sending a rush of invoices to get paid – they trigger the threshold alert but we can ignore them.

I wondered with the VAR construct if there’s an elegant way to condense the five measures into fewer, but they operate on different time periods so probably not.

And on the Standard Deviation – (possible dumb question coming since I hate statistics even more than date arithmetic), is there a way for the StDev measure to take the deviations between monthly or weekly groups instead of against the daily values? In other words for 12 months of data n=12 instead of n=365 (or however many individual records there are)? That would provide a more realistic set of values against which the deviation is plotted.

Well if you’re limited to Excel and locked out of PBI service, then you definitely need Power Update alerts 🙂

And… wow. I thought my measure in this article DID calculate on a monthly basis, but nope, it does it on a daily basis. Wow, this whole post may need re-writing. I’m starting to wonder if that’s why I needed 20x the deviation.

These are the pitfalls of working with fake data. You can’t have any good intuition of whether advanced stuff like this is correct.

I’m not in a place where I can test this formula right now, but something like…

`[STDEV Sales over Last 12 Months] :=`

`STDEVX.P(ALL(Calendar[MonthID]),`

Calendar[MonthID]< =MAX(Calendar[MonthID]) && Calendar[MonthID]>=MAX(Calendar[MonthID])-11

),

[Total Sales] )

Try that and report back?

1) I don’t feel “limited” to Excel 😉 On the contrary, the freedom to poke around in my own sandbox, copy pivot table results to use in “standard” Excel functions and charts, and share the data to allow others to mix ‘n’ match their own dimensions without affecting the “master” dashboard are core benefits. I could write a blog post on that! For example, PERCENTILE measures take forever to run on my data – for 35,000 YTD payments it’s much easier to copy the pivot table listings to a new sheet and create a chart showing the distribution of check amounts. Not dynamic, but 10 seconds of copying the data to the chart table vs. waiting 25 minutes for the measure to update! Of course, I readily admit my measures might be really bad.

2) I’ll try to test your measure soon, but your measure change is a very important distinction on filters that poseurs such as myself have trouble with. I would have initially thought to wrap the STDEV() around a SUMMARIZE function that totals the sales (or invoices) by month, then running STDEV on that result. I don’t immediately grasp the distinction in changing your measure to ALL(Calendar[Month ID]) instead of ALL(Calendar). To my amateur level of thinking, if ALL(Calendar) ends up returning n rows then changing it to [Month ID] still returns the same number of rows in the filter rather than giving me the summarized groups. You’ve probably written blog posts galore on this, but it’s always a hurdle for me.

1 – Preaching to the choir here, I still love Excel as my “pure numbers” environment, even when I’m using DAX and M. I subsequently often move the resulting model into Power BI though – for more polished visuals. But boy, is the PBI cloud service a godsend. Not having THAT would really sting, so I think that’s what I meant by “limited.”

2 – SO FUNNY. “I was expecting something simple like SUMMARIZE, but you used ALL().” Exactly the opposite of my experience! I forget – do you have an SQL background?

Try writing a COUNTROW(ALL(Calendar)) vs. a COUNTROWS(ALL(Calendar[MonthID])) – ALL() is kinda like VALUES() when you use it as a Table argument, but with all filters cleared.

I think the weird thing here is the duality of functions like ALL(). Most commonly, we use those functions as a “verb” – as a filter argument to CALCULATE. But we can ALSO use them as Table arguments, which is more their “noun” usage. We don’t have the same level of ambiguity with SUMMARIZE(), which is so clearly a table/noun.

If you have our 2nd edition book, go give Chapter 24 a quick read. The concepts in there may change your life 🙂

a) Your rewriting of the measure does work, as I’m sure you realized.

b) I finally got my copy of your book back from the person I loaned it to and looked at Chapter 24. Alas, it’s too brief for me to fully understand your point as it focuses on overriding filters rather than grouping data. I used the COUNTROWS you noted above to see how many rows were returned, and indeed the number of rows varies. But… how does this get fed back to the measure to group the STDEV accordingly? I do weeks instead of months but the concept is the same.

Invoice Count:=CALCULATE(

DISTINCTCOUNT(Vouchers[Invoice_Key]),

USERELATIONSHIP ( Vouchers[Account], ‘Account Tree'[Account] ),

USERELATIONSHIP ( Vouchers[Dept ID], ‘3PL Cost Centers'[Center] ),

USERELATIONSHIP ( Vouchers[Dept ID], Depts[Dept] )

)

STDEV Group Count Over Last 6 Months:=

VAR WeeksDuration = 26

RETURN

STDEVX.P (

FILTER (

ALL ( ‘AP Calendar'[Week ID] ),

‘AP Calendar'[Week ID] = MAX ( ‘AP Calendar'[Week ID] ) – WeeksDuration

),

[Invoice Count]

)

My ALL clause in FILTER is returning 26 rows according to the COUNTROWS. So how do the 26 rows from the calendar table “know” how to group the invoice counts? I dimly feel this relates to the concept of extended tables… but it would be nice to have a picture 😉

I’m not sure if other people on this thread appreciate my enjoyment of this chat – you can decide whether to post or not!

SQL was kind of a detour in my history… let’s see if I can remember everything in order.

PDP-11 compiler in college

dBase II and III (then to FoxPro)

FORTRAN

Turbo Pascal

Lotus 1-2-3 macros

WordPerfect macros

AS/400 (tough to use, never got the hang of it)

Wang (at lease we didn’t have Prime and Unisys too)

Can’t remember what language the DEC VAX 1100 series used… was it DCL or something? Great machines though.

– then client-server came along –

Power Builder

JCL, COBOL (briefly)

FOCUS (much easier than JCL for pulling large datasets from the mainframe to the client)

VB 5 and 6 (still the foundation of my VBA macros 25 years later)

– Moved into more consulting and management roles at that point, got tired of learning the next company’s programming language du jour –

SQL to run against ERPs

DAX and M

Smattering of R and Python, wouldn’t trust the numbers I pull yet

Can I please stop now? Haven’t I done enough damage?

Never got into bare-metal languages like C and its descendants. I’ve probably gotten the most value from dBase/FoxPro and subsequently VB since I focused mostly on data analysis rather than developing transactional programs.

Here’s the crux of the matter. My Senior VP accuses me, partly tongue-in-cheek, of being a numbers guy because I can generate this. I don’t think of myself that way, I’m a strategic decision-maker. (My boss sees that too at least.) But you need the numbers to know the truth of where you are, if not how to get to where you want to be, and in my experience the numbers most people rely upon are wrong. So you have to test the orthodoxy, and if I’m better at making strategic decisions it’s because I’m better at getting right numbers. I eagerly await the day when I can turn all my efforts over to my staff and trust their results.

The trouble with trying to quit any sort of language is that new analysts are smart, and they can use their language du jour to pull numbers, but they don’t know how to ask the right questions at the beginning or understand the quirks of the data and why you have to join on this field rather than that or why the ERP reports a total in each row of data instead of the fraction that should be associated to that particular transaction. To train them you have to show them the numbers you expect to get and work backwards to show them the question(s) they need to ask of the data at each step so they don’t make a wrong turn. So you have to be able to pull the numbers and know if they pass the smell test as part of the training. SQL goes against the ERP, M goes against the SQL, DAX goes against the M, then turn it over to their skillset if it’s not PowerXX. It works for me, but as in our ALL() discussion I have a way to go too.