Post By Dany Hoter

Intro from Rob:  In this post, Dany demonstrates how we can use Power Query to add a numerical index column to our table, and then use that to address the previous row in a calc column.  Lots of good stuff here.

More generally, this is achievable using the misleadingly-named EARLIER function.  EARLIER does not mean “previous row,” but coincidentally, you can use it to do just that, and many other “cross-row reference” things.  Check out this post on EARLIER for the basics (or read the calc column chapter in Power Pivot Alchemy), and read some of the comments at the end of this post for further examples.

Take it away, Dany…

Business Case

I was approached with a business question from the car fleet manager in our company.

There are many different types of cars and the fleet manager is trying to optimize the fuel cost and reduce emissions.

The data about the fuel consumption comes from a smart device installed in the car.

Every time a company car enters a gas station the device reads the car’s odometer and reports the car’s ID + the odometer + gas quantity purchased to a central database.

From this data we needed to create a report that shows average fuel consumption by make and model of the cars in the fleet. We got the data as two csv files , one with the entire history of gas transactions and one with the car fleet details.

Sounds Simple

The data model and the report are really basic and straight forward as you can see in the attached workbook which is obviously fake data created for this article.

So where is the challenge? The only problem is that each row representing a fuel transaction includes the odometer for this transaction and the question is how to calculate the distance traveled since the last visit to the pump.

The Excel way

In excel this is a very simple problem.

– Make sure the rows are ordered by car plate # and date

– Subtract the odometer value for the previous row from that of the current row checking that both rows are from the same car. Ignore the first transaction for each car.

clip_image001

The example uses a table and notice that the expression is a mix between structured reference (circled black) and regular reference (circled red) when the cell referenced is on a different row in the table.

One problem with this solution is that it is difficult to make it refreshable without using VBA.

Next month when there will be more data the table need to be sorted again and this is possible only from the UI or from VBA.

The Power Pivot way

Actually there is no PP way without some help. In DAX you can’t reference the previous row in any way because there is no order to the rows. There are probably solutions to this specific example using purely DAX but I’m sure that they are pretty complex.

Some Help from Power Query

I used PQ to create a new column which helps PP understand what the previous row is.

If you open the query that reads the fuel csv file, you’ll see that I sort the data and immediately add an index column to the table which persists the current order of the rows which is ascending by car and date.

Back to Power Pivot

I can use this new column in PP to calculate the difference between any row and the row next to it.

I add a calculated column with this expression:

=IF(

‘Fuel'[Plate]=LOOKUPVALUE(‘Fuel'[Plate],’Fuel'[Index],’Fuel'[Index]-1),

‘Fuel'[Odometer]-LOOKUPVALUE(‘Fuel'[Odometer],’Fuel'[Index],’Fuel'[Index]-1)

,0

)

The index column allows me to reference other rows relative to the current row using the LOOKUPVALUE function.

Summary

Using Power Query I was able to sort the data and persist the order after sort using an index column.

These two operations are part of the import operation and so will be performed each time new data arrives.

The index column enables creating a calculated column that compares data in the current row with data in the “previous” row, when previous is defined by the index column.

Download Files

  Subscribe to PowerPivotPro!
X

Subscribe

This Post Has 17 Comments

  1. An interesting problem – but unless i am missing something, as an alternate technique why not just use the ‘LASTNONBLANK’ function to find the odometer reading for the last time the car was filled up (using ‘FILTER’ or ‘CALCULATETABLE’ to make sure it is calculating with respect to the current car ) ? LASTNONBLANK always uses the chronological sort order of dates; not the sort order in the pivot ui.

    then inside the parameter’s inside of LASTNONBLANK, a combination of DATESBETWEEN and DATEADD (dates, -1,day) should get you what you want. (all dates before today to find the LASTNONBLANK odometer reading.)

    Then finally test to make sure the function returns a value (maybe car has only been filled once – a new car) and decide what result to return. Then subtract the value from the current row to find km driven.

    I apologize for the vague pseudo code, but it should work once the details are sorted through. This may be considered complex, but really my messy explanation above makes it seem harder than it is.

  2. That would work, but you need an additional date-table for providing the necessary order.

    What I like about this example is the view on the fluid boundaries between these tools. They complement each other, overlap and sometimes even rival: Here we could even dare to consider the unthinkable: Doing this without PowerPivot altogether (wonder how long this will survive in here…:-) :

    let
    Source = Csv.Document(File.Contents(“…YourPath\….Fuel.csv”),null,”,”,null,1252),
    #”First Row as Header” = Table.PromoteHeaders(Source),
    #”Changed Type” = Table.TransformColumnTypes(#”First Row as Header”,{{“Plate”, type text}, {“Odometer”, Int64.Type}, {“Date”, type date}, {“Liter”, type number}}),
    #”Sorted Rows” = Table.Sort(#”Changed Type”,{{“Plate”, Order.Ascending}, {“Date”, Order.Ascending}}),
    #”Added Index” = Table.AddIndexColumn(#”Sorted Rows”, “Index”, 1, 1),
    // Add 2ndIndex starting with 0 for self join (shifted Index)
    Added2ndIndex = Table.AddIndexColumn(#”Added Index”, “Index.1”, 0, 1),
    Rename = Table.RenameColumns(Added2ndIndex,{{“Index.1”, “Index2”}}),
    // SelfJoin Table on shifted Index will bring in values from previous row
    SelfJoin = Table.NestedJoin(Rename,{“Index2”},Rename,{“Index”},”NewColumn”),
    Expand = Table.ExpandTableColumn(SelfJoin, “NewColumn”, {“Plate”, “Odometer”}, {“Plate.1”, “Odometer.1”}),
    CalcKM = Table.AddColumn(Expand, “KM”, each if [Plate]=[Plate.1] then [Odometer]-[Odometer.1] else “out”),
    // get rid of first lines that don’t have predecessores, therefore no KM calculation
    FilterLinesWoPredec = Table.SelectRows(CalcKM, each [KM] “out”),
    JoinCarFleet = Table.NestedJoin(FilterLinesWoPredec,{“Plate”},CarFleet,{“Plate”},”NewColumn”),
    ExpandAttr = Table.ExpandTableColumn(JoinCarFleet, “NewColumn”, {“Make”, “Model”}, {“Make”, “Model”}),
    ChangeType = Table.TransformColumnTypes(ExpandAttr,{{“KM”, type number}}),
    CalcAvgTrip = Table.AddColumn(ChangeType, “AvgTrip”, each [Liter]/[KM]*100)
    in
    CalcAvgTrip

    This will create a table with all fields you need but one: The weighted average would be an inserted calculated field in the pivot table itself.

    Stay querious 🙂

  3. Hi,
    Here you go just single DAX calculated column for one shot solution. (no need to powerquery).

    =if(
    len(CALCULATE(max(Fuel[Odometer]),FILTER(Fuel,Fuel[Plate]=EARLIER(Fuel[Plate]) && Fuel[Date]0,Fuel[Odometer]-CALCULATE(max(Fuel[Odometer]),FILTER(Fuel,Fuel[Plate]=EARLIER(Fuel[Plate]) && Fuel[Date]<EARLIER(Fuel[Date]))),0)

  4. sorry for format.

    =if(
    len(CALCULATE(max(Fuel[Odometer]),FILTER(Fuel,Fuel[Plate]=EARLIER(Fuel[Plate])&& Fuel[Date]0
    ,Fuel[Odometer]-CALCULATE(max(Fuel[Odometer]),FILTER(Fuel,Fuel[Plate]=EARLIER(Fuel[Plate]) && Fuel[Date]<EARLIER(Fuel[Date])))
    ,0)

  5. Sorry again, it’s weird- Part of query goes missing after posting the comment.

    anyway,
    Below DAX query will give you latest reading of odometer for that car-plate using Earlier function.

    “CALCULATE(max(Fuel[Odometer]),FILTER(Fuel,Fuel[Plate]=EARLIER(Fuel[Plate]) && Fuel[Date]<EARLIER(Fuel[Date])))"

    and based on that you can take just difference of two reading 🙂

    =if(
    len(CALCULATE(max(Fuel[Odometer]),FILTER(Fuel,Fuel[Plate]=EARLIER(Fuel[Plate]) && Fuel[Date]0
    ,Fuel[Odometer]-CALCULATE(max(Fuel[Odometer]),FILTER(Fuel,Fuel[Plate]=EARLIER(Fuel[Plate]) && Fuel[Date]<EARLIER(Fuel[Date])))
    ,0)

  6. You can in fact do this in DAX, not even that difficult, just build a lookup to previous plate number and take the max prior odometer reading (this does rely on odometer being monotonic, if it was not we would have to do it differently and set date = max(DATE) less than current row):

    PrevOdometer
    =calculate(max([Odometer]),all(fuel),Fuel[Plate]=earlier(Fuel[Plate]),Fuel[Date]<earlier(Fuel[date]))

    You then wrap it in a check where it's blank (first row for plate) and set that to 0:
    =if(ISBLANK(calculate(max([Odometer]),all(fuel),Fuel[Plate]=earlier(Fuel[Plate]),Fuel[Date]<earlier(Fuel[date]))),0,[Odometer]-calculate(max([Odometer]),all(fuel),Fuel[Plate]=earlier(Fuel[Plate]),Fuel[Date]<earlier(Fuel[date])))

    The better answer is that you really do not want powerpivot doing row-to-row lookups. On really small workborks like this it's fine, but powerpivot, as a column-based tool, really sucks at this kind of work. Instead you should be pre-processing the data and adding columns like this somewhere else – Excel, PowerQuery, database, sas, python etc…

    Twisting Powerpivot like this as datasets get larger is like using the back of a screwdriver to hammer in a nail.

    1. I found this reply helpful, because in my case the table from which I needed to obtain the previous row’s data for was data from a previous week, but not every row was repeated for all possible weeks. Without assuring the data are properly spaced (same number of rows per person/unit/etc.). I’m curious if the calculate max all =earlier <earlier approach can be coded directly in PowerQuery, either using existing M functions or a custom function?

      1. Isaac,
        Not quite sure what you’re after here: relative references in PQ or some sort of time intelligence. But these sources might help

        This is an excellent post on how to use relative cell references in Power Query: http://excel-inside.pro/blog/2015/11/05/absolute-and-relative-references-in-power-query/

        A bunch of good examples on how to use date functions in Power Query can be found here: http://ginameronek.com/2014/10/01/its-just-a-matter-of-time-power-bi-date-time-dimension-toolkit/

        Check out MS’s site for details on how they work: https://msdn.microsoft.com/en-us/library/mt296606.aspx
        :

    2. Hi Trevor,

      I would be very interested in seeing the solution for when Odomoter is not monotonic, i.e. when we need to set date = MAX(Date) less the current row. I have been trying to figure out the latter part, that is how to subtract the current row. Would it be possible for you or anyone to clarify this?

      Thanks in advance.

    3. See now, THIS is useful information…understanding the strengths and weaknesses of each technology available makes you a better user/designer/developer/advocate. It’s extremely helpful to know when the answer is, “yes, you can, but no, you shouldn’t.” Thank you for this!

  7. Dear Moderator :-),
    just recognized that my comment should have been a reply to Matthew Brice (otherwise the 1st sentence wouldn’t make sense – could you please rearrange that?

  8. Sorry for posting this here since it does not relate exactly to the topic at hand, but I am having an issue with PowerPivot for which I would appreciate some input.

    The company I am with has embarked on a multiyear, high 6 to low 7-digit project to automate payroll. The product is still a few months away. We took it upon ourselves to build a solution in PP and were able to do so.

    However, when I mention our product I get shut down because “you are using Excel” to do payroll which security wise is not acceptable.

    I think it is a valid point. Do you know if there are solutions to this problem so the PP solution can stand the test of auditors? (e.g. databases cannot be edited, measures cannot be changed or deleted, etc.). Thank you.

    1. The short answer, Peter, is to use a server. Take the model you’ve built, and then lock it down on SharePoint rather than distributing the workbooks. Or perhaps even better, “upsize” it to SSAS Tabular (which is a painless, one click transition).

      Either way though you can restrict the number of people who can modify the model, down to a tiny audience, AND audit the change logs.

      The best psychological point you can make is this:

      Power Pivot is really ‘just’ SQL Server Analysis Services – Microsoft’s Most Robust and Industrial-Strength BI Server – running behind the scenes, with Excel merely as the “host” and the “frame.” The logic in your model is NOT Excel logic. Not at all. It’s just cleverly disguised so that Excel PEOPLE can learn it. It’s a gateway drug to Analysis Services, which is as secure and as “Enterprise” as it gets in the BI world.

  9. The DAX solution is undoubtedly clever stuff and I aspire to being able to rattle off a DAX line like that in the blink of an eye
    What I would say about the Power Query approach is that it is far more transparent to the uneducated. Long lines of DAX remind me of mega formulae in Excel which bamboozle the uninitiated (and sometimes intentionally so). I’d want that DAX broken down into helper columns / components so the business logic can be clearly passed on and understood by the recipient
    As a VBA developer I am now actively promoting the use of Power Query and Power Pivot as an alternative to writing VBA wherever possible

  10. With Microsoft’s acquisition of Revolution Analytics I’m hoping we see Power R being released some time in the future. DAX is a formula language that’s so tied in with Excel. It’s so cryptic with its multi-nesting of round brackets and square brackets. Being tied in with Excel it suffers from not being extractable or scriptable, or ‘diff’able’ for comparison or source code control purposes. And with this blog post trying to demonstrate how to diff consecutive cumulative values I’m thinking this is so much easier to do in R or with the Python pandas module. In both of these free, long established, open source offerings the data frame is a built-in, first class entity. Windowing functions such as lead, lag, moving averages etc are present. The Power Query M language seems to be Microsoft’s attempt at making a scriptable language along the lines of R. I’d expect to see windowing functions to be added to M at some time in the future. Windowing functions implemented in very clever, cryptic DAX I’d suggest can only be authored by level 500 DAX experts, not your average ‘BI for the masses’ user.

    1. “Hey Rob, it’s London calling.”

      I waited until now to moderate this comment so that I could respond to it. I wouldn’t mind seeing Power R, either – let’s hope that happens! 🙂

      That said, I think you might be more in the minority than you’d expect. As “clumsy” as nested Excel formulas are, they’ve proven to the the most widely-known programming language in the world, by FAR. I absolutely *do* understand that they don’t “scale” as elegantly as other programming languages, but the usability debate begins and ends with their rampant popularity. (We could spend a long time discussing WHY they are so popular, but let’s leave that for another day).

      Once we start talking about things like Python, yeah, I get the advantages, but still – very few of Excel’s power users will ever even LOOK at it. (I would expect similarly widespread rejection of M, fwiw, but Power Query does offer an awesome GUI that improves every few weeks, AND that GUI can be used as an educational tool for people learning M, much like macro recorder helps people learn VBA. But if it was just a raw IDE that let you write M… forget it.)

      Now let’s get to this last point, about how DAX struggles with “diff from previous row.” That’s true actually – it’s harder to do even than in Excel. But as someone who’s worked in DAX professionally for 5+ years, I can say with confidence that this scenario actually doesn’t come up that often.

      Sounds like BS from the outside, I know, but this is specific to prior ROW. You want prior month, day, year, quarter? All quite simple. It turns out that most of the time, single-row calcs aren’t that useful. It’s the aggregates that matter. And calculating aggregates is DAX’s sweet spot.

      By banishing A1-style reference and focusing on aggregate scenarios, DAX is a killer for the majority of analysis tasks. That comes at the cost of row-wise navigation calcs being tricky. It’s a good tradeoff though.

      The one thing DAX is *not* good at, today, is statistical stuff. Linear regressions, etc. I can *do* them in DAX, but it’s not optimal. Of course, once I get into linear regression land, I quickly discover that *I* am the weakest link. I am just not sufficiently trained or experienced, and I can write the formulas five different ways, yielding five wildly different conclusions, and I have no idea which one is the most valid approach.

      Anyway, circling back. There are 30+ million Excel power users – those who sling pivots and VLOOKUPS all day – and we can teach ANY of them to use DAX, with lifechanging results. No 500 level audience required – seriously, I wish you could sit in on one of our classes and see how bright these folks are, despite never being programmers or BI pros. But even that massive audience is merely a fraction of the total Excel user base. So in that sense, I agree that it’s not for the masses, but in every group of 15 people, 1 of them IS a future DAX “programmer.”

Leave a Comment or Question