I have been working fairly extensively with Finance teams, leveraging Power BI to build financial Models, dashboards and reports. (See some of my recent posts Financial Dashboards, Personalized Dashboards and Q&A). Regardless of the team we’re working with, a very common scenario we run into, is the need to combine System data with Manual data. To clarify:-

  • System Data: Represents the data coming from official system sources (could be a Data Warehouse, SAP etc.)
  • Non-System/Manual Data: This may be manual data maintained by humans or this may be “System” data that is not tracked in the official system. Could be your own SQL/Access Database on the side or a system used only by your department and not the whole company.


Mashups can be fun, if done the right way

While there are many scenarios where you would need to combine System and Non-System data, for this post I want to cover one such specific scenario. Watch below or continue reading… (link to download files at the end of the post).

Scenario: What if the System Categorization does not meet your needs 100%?

Let’s say you have pulled the Data table and the Lookup tables directly from a System (official) source. However, due to some reason you are not 100% happy with the way a given Lookup table categorizes your data. For example, let’s say we are pulling expenses from the system, but need to make a few tweaks to how they are categorized in the system.

Here the changes needed are:

  • “Office Supplies” needs to be categorized under “Office Expenses”
  • “Professional Services” and “Other Services” need to be grouped together as “Miscellaneous”


Need to override some of the system mapping

There are many scenarios where we see such a requirement

System mapping is out of date: A change has occurred in the operating system which is not yet reflected in the Reporting Systems. But of course, it needs to be reflected in the end report.
I have seen this one play out so many times. Often teams rush headlong on a project to deliver some cool new change. The impact on reporting, or the need for reporting around this change is wholly ignored. Only when the change is live, does someone raise this issue.

Of course the DGP (Data Gene Person) on the team had been screaming his/her head off about this for so long, but was ignored by everyone. And now, they turn to him/her and say. “Jim/Jane, just make it work!”. Sigh!

System is chronically behind the Business Needs: This is a slight variation of the one above. While the above can potentially be solved by adequate management; this one is perhaps unavoidable. In my experience, business always moves ahead of IT/System. Business has a bias for action, for change; business is constantly evolving. IT teams, even the best I’ve seen, have a bias for stability (read inaction). You need to prove something is needed, before the change can be implemented.

That means, there would always be a gap between where the business and business needs are versus what the IT Systems deliver. This is not a ding on IT, just a reflection on the motivations behind the two sides.
(Have a different take on this? Let us know by leaving a comment.)

Big Boss likes to see data a specific way: The higher up the totem pole we go, I realize the room for discussion gets smaller. Often when I am creating a report for a CTO/CFO, I may not even have direct access to them. In many cases, I am told that the report needs to be delivered in a specific format, with specific groupings.

Classic Approach: Hostile Takeover

The classic approach (think the Old Excel world) would involve taking over the whole mapping table. In our case, we would make a copy of the Accounts table in Excel and then edit the specific values that need to be changed. This gives you complete control over the mapping table.

Classic Approach: Hostile Takeover
Copy the complete System Table and modify the values as needed

And this gives us the end result we desire. But if you have ever used this approach, you already know the largest pitfall. It is hard to synchronize system changes. As items get added, deleted, modified in the System table, it is really hard to sync all those changes to your Manual table

As items get added, deleted, modified in the System table, it is really hard to sync all those changes to your manual table

What we really want is a smarter approach.

Modern Approach: Precision Strike (Exception Mapping)

We really do not want complete control over the system mapping table. All we want is for the specific changes needed to be reflected, almost layered upon whatever the system mapping table is providing. Instead of copying over the complete system table, we only track the exceptions in our manual table. Then we would smartly combine these manual exceptions with the system data using Power Query. Only tracking exceptions in the manual table makes for a much more compact manual table and also ensures that all the system changes (additions, modifications, deletions) would flow through to our model unmolested – without us having to manage them manually.

Only track exceptions in the manual table


Only track the exceptions in our manual table

Here is a graphical view (Query Dependencies view) from Power Query.


Overlay the Manual data on top of System Data

Here is the Power Query code.


Power Query to splice in System and Manual data (click to enlarge)

I have found this Left-Anti Merge approach to be more deterministic. As opposed to Appending Manual data and then removing duplicates or some other approach. Surely there would be other ways to do this, but this works for me.

And here is the desired end result in the report:


Our manual exception mapping have been overlaid on the system mapping

Combining System and Manual Data: a Universal Problem?

As I mentioned in the opening, I have encountered the need of combining system and manual data in nearly every Power BI* project I have been associated with. And M (Power Query) & DAX (Power Pivot) engines have enabled me build robust and elegant solutions to effortlessly combine system and manual data.
(* I use Power BI in the generic sense, encompassing both Power BI and Excel/Power Pivot/Power Query world.)

Tell us what your experiences have been combining

  • system and manual data
  • system and external data
  • or generally data from multiple sources

Do you have any of your favorite tricks that have helped you?

Download Files: ZIP

Power On!
– Avi Singh
LinkedIn Twitter

  Subscribe to PowerPivotPro!
X

Subscribe

Avi Singh

Avi Singh has personally experienced the transformation and empowerment that Power BI can bring - going from an Excel user to building large scale Power BI solutions. His mission now is to share the knowledge about Power Pivot and Power BI. 

This Post Has 12 Comments

  1. A surprisingly common problem for me down the years, the difference now being that there are tools at your disposal to make life so much easier

    I think the ‘key’ from my experience is to make that mapping table which bonds data sources together as maintainable and flexible as possible so house the mapping table as a spreadsheet based table in it’s own right that gets imported into the data model

    It’s interesting you mention this now. I have a slightly related problem in the sense that I need to be able to provide the ability to accommodate financial adjustments in my data model, i.e. to allows core sales values to be adjusted in my fact table by virtue of looking up an adjustments table

    I wondered if you had come across this ‘sales adjustments’ scenario and wondered how you had approached it Avi?

  2. I have handled Sales Adjustments under a few scenarios

    a) Adjustments are made at the lowest grain: So if the Sales data is at SKU, Zip Code and Day level, the adjustments are made at the same level. This is the dream scenario, since it’s straight-forward to handle. As easy as appending the table, albeit I keep a custom column in there as “DataSource” = System or Adjustment so I can see these two separately if needed. Since this is the dream scenario, this rarely happens.

    b) Adjustments are at a different grain than the Data Table: So the adjustments are perhaps at State or Country level, maybe Monthly instead of daily etc. Here I would keep these as separate tables and make the adjustments in DAX. Effectively the new “Adjusted Sales” measure can only be shown at the lowest grain across both tables (Sales and Adjustments). So after adjustments you would not be able to show Zip Code level or daily trends anymore of the “Adjusted Sales”. But that’s just the price you may.

    Corollary a) There is a scenario where we have implemented a) for real. The scenario was that the end-user did not provide me an adjustment table at the lowest grain (which is virtually impossible for any respectable sized dataset)…instead they provided me rules on how the adjustments need to be made.
    E.g. Split the “Canada” revenue 60% to NorthEast region, 30% to Central region and 10% to Western region.
    Then I would generate the “Adjustments” in Power Quert following these rules. Of course the table I generate would be at the lowest grain; and would fall into scenario a) above.

    Power On!

  3. As a BI solutons architect (& CGMA) I am often workign with both FDs and BI analysts & I often come across the same issue where the the FD wants a Boards view and the BI guys need a managment view whereas the accountants want their auditors view all of the same financial data often wranged with non accounts trading data. Theres always someone whom doesn’t adhere to reporting based on the Chart of accouts (COA) for reporing.
    I tend to keep it simple and give them (Data Steward) the solution they know best and let them manage their personalized reporting dimensions in excel tables.

  4. Thanks Avi, so the situation I face is as follows:

    Sales data at lowest grain is by store. A Sales data adjustment will need to be made at a Channel Level. The scenario is that Christmas Food orders by internet are not captured in Sales data because they are processed by a different system as a complete one off at Xmas time.

    An adjustment therefore needs to be made to increase internet sales in the Sales Fact table once the additional data source is received
    There are some ‘pseudo’ store codes where the associated Channel = Internet so I’m guessing that I can assign my adjustment in it’s entirety to one of these codes for the particular week concerned? Internet sales aren’t split out by store because by their very nature they are not store driven (cash sales) rather orders placed via internet

    So I have an adjustments table that has:

    StoreCode, Week, OverrideAmount

    In my fact table I look up on StoreCode to establish the presence of an OverrideAmount in the Adjustments table. Should one exist I use that else I just defer to the original amount in the Fact Sales Table. I’m only going to find an adjustment for the Internet pseudo store code as per adjustments table

    This sounds too easy but wanted to run the approach past you

  5. This is such a common requirement. Great to see a blog post about it Avi.

    I think handling adjustments justifies another blog post too. It’s surprisingly common, especially when dealing with employee performance. (E.g. I want to exclude this case from the statistics because it wasn’t the employee’s fault it took so long to resolve, or I want to treat this sale as a November sale as it’s not the reps fault that it didn’t ship before the cut-off)

  6. This is not a rhetorical question:
    Why not just look up in the “exceptions table” (very small), and if not found, then look up in large system-based table? Not as fancy, but very simple.

  7. Like Kim, my own preference is to add a calculated column to the system dimension table as loaded into Power BI containing the override values. The calculated field pulls in the local override rollups or descriptions from our local overrides table, if a row with matching keys exists, and uses the system table values if not. That makes it easier for me to report under either corporate or local group mapping as desired, and ensures the overhead is handled during the weekly load rather than as local slicing occurs.

  8. This is such a common issue that it must take many flavors. Combining data from multiple ERP/OLTP systems, or from ERP and external (e.g. marketing data from an external source), or any combination thereof is always difficult. Having to manually synchronize the translation tables or synchronized DAX formulas is a constant effort.

    I think a lot depends on how “new” the links you’re creating are. In our Power/xxx roles we are often the first to make connections between data and departments that previously never saw the need to synchronize. Inevitably in those cases you’re going to have to create and maintain those links until the organization decides if they’re worth formalizing.

    Where we have the ability to control the input our default response is to being tracking the external data in the source system if possible. Something as simple as matching latitude/longitude to our courier locations… we used to run an Excel macro to add lat/long to the location extract, but IT added it to our master location table. Vendor-managed locations were tracked outside our inventory system until we put a flag in the table to indicate who manages it.

    Financially a location may belong to one organizational unit, but logistically it may belong to another. An office in our southern region has to have supplies placed on a southern courier route, even though we then bill back to an entirely separate organizational entity. Whether inside the ERP system or outside on spreadsheets someone has to keep up the links.

    As an employee we can agitate to have the value of our work reviewed and press for a more robust steady-state solution. As a consultant I think you are forced to come up with a fast solution but I suspect you often don’t have time for giving your client a roadmap to automate the connections. Maintenance of the links can soon gobble up more time than is spent in generating the report. I don’t have a good answer to that problem except to hope that the value of the report is such that the organization is willing to make the effort to get to a steady-state.

  9. For reporting purposes it is enough to do a double grouping.

    Build a group “Office Expenses” including the line items “Office Expenses” and “Office Supplies”.
    Based on this group build a second group “Miscellaneous” including the line item details “Other Expenses” and “Professional Services”.

  10. What would you suggest to do if you have an existing tabular model to which you connect, but have no impact on. So you can only connect to the model, but not change it at a model level? So you would want to update / enrich a full tabular model with local data… I haven’t found any convenient way for this.

  11. This is a great idea. I never really used the Left/Right Anti unless I was looking for something missing from my master table. In my MS Access world, it was doing a left or right join and looking for nulls in the source table. But you’ve shown me a great way to use it outside of looking for errors.

    I had an interesting project this week where I found out I needed to add data. We needed a table in human format – so it was Item, Quantity, and then 12 months of data in 12 columns. So I did all the work in PQ and did a final output where it used the Table.Pivot, then did a Replace.Values to convert Nulls to 0. A Pivot table in Excel as the end result would not work in this scenario.

    The Replace.Values blew up though each time if there weren’t 12 months of data to look through.

    So I added a list of the 12 dates I needed (dynamically calculated from a cell in Excel), populated with a fake item number, added a fake qty, appended to my source data, *then* pivoted the data, cleaned the nulls, then cleaned my fake 12 rows.

    So all kinds of ways to need to add data to a set of fixed system data.

    There may (probably?) is a better way to use Table.ReplaceValues without being dependent on a fixed number of columns, but I got my project to work.

    Left-Anti added to my quiver! Thanks again for the great idea.

  12. Do you know how to do incremental data load? I’ve set of data from googlesheets that is updated everyday, and i want these data can be combined with previous data that already in powerBI.

Leave a Comment or Question