By Dany Hoter

Intro by Avi: I have often been asked, about ways to provide an “input” to the Power Pivot model from Excel. Disconnected Slicers are a popular way to do this. But with Excel and Power Pivot, there is always more than one way to accomplish a task. Dany shows us how, while making histograms easier to use. He uses a filter dropdown, which even works with Excel Online – inside a browser! Here is the end result, read on to learn how and download file.
image

Take it away Dany…

Introduction

Creating a histogram in Excel based on Power Pivot is not as easy as it should be.

The method I use is no different from what others have already blogged and wrote about. There is even a solution that calculates the number of bins in a histogram with a formula that is based on the total number of cases.

My take on the problem was to let the user choose in run time what is the interval between each bin as a percentage and to show the number of bins accordingly.

Business Case

The model contains data about a service that is in its infancy and so the users experience a relatively high number of errors.

The managers responsible for the service posed the following request:

“We want to see a histogram of the sessions showing how many users have experienced no errors in all the sessions they initiated, how many experienced errors in 10% of the sessions, 20% of the sessions … all the way to these poor users who saw nothing but errors in 100% of the sessions (Told you it is in early stage…)

Solution

We had a table with the Users, a table with Sessions and a table with Errors for Sessions.

image
Diagram View of Model

The actual histogram was created by adding the disconnected table ‘Buckets’ for the bins and creating the following measures:

– Total sessions from the sessions’ table
Sessions:=DISTINCTCOUNT(Sessions1[SessionId])

– Sessions in error from the errors table
FailedSession:=DISTINCTCOUNT(Errors[SessionId])

– Fail ratio as a ratio between the previous two
Failure:=DIVIDE([FailedSession],[Sessions])

– Users for a histogram bin as below
Bucket := CALCULATE (
    COUNTROWS (
        FILTER (
            Users,
            AND (
                [FailRatio] >= MIN ( Buckets[From] ),
                [FailRatio] < MAX ( Buckets[To] )
            )
        )
    ))

As this was pretty easy I wanted to add the option of the user to define the # of buckets at run time The problem is that the bins from and to columns are calculated columns that need to be refreshed in order to produce new values. So the user need to input the new interval between buckets and then trigger a refresh that will take the new value and use it to calculate the columns in the buckets table.

The buckets table is a linked table with columns that looks like this:

clip_image002

It has 30 rows to allow for a large number of bins. The value of the Interval is used to calculate the bins’ ranges.

A single cell pivot based on the buckets table is used to trigger a refresh just for the buckets table.

image

The first state show 10% as the interval and 10% as the value from the buckets table (average(interval) )

The user changes the value to 20% which recalculated the Excel buckets table.
After that, user refreshes the tiny pivot that says Refresh here by right clicking and choosing refresh.
image

At this point the buckets table is refreshed in the model and the users are spread between the new bins.

image 

Voila – bins are now shown in 20% intervals! The last column shows sessions at 100% error. All the remaining columns in Buckets table with the from value above 100% are hidden by the pivot chart because no data is returned for those.

In order to enable the input from Excel web, I have used pivot filter as the input method. For Desktop use you could simply allow user to type in or select a value from dropdown, you could even auto-refresh once the input value is changed.

Download File

  Subscribe to PowerPivotPro!
X

Subscribe

This Post Has 9 Comments

  1. Why do it this way as opposed to using an unrelated table with min and max? With the unrelated table there is no need for a refresh, curious if there are virtues here I am missing?

    1. It is using a disconnected table for defining the bins.
      The only reason to use refresh is to allow the users to change the interval between bins in run time.
      If you are happy with the default , you do not need to refresh ever.
      Any time you want to recalculate calculated columns you need to refresh the table.
      The table is tiny so refresh is, fast as long as you don’t accidentally trigger a “refresh all”.

    1. Vovka, download the file (link at the bottom of the post) and check it out. It is using Conditional Formatting. That’s what I love about PowerPivot, you don’t chuck all your years of Excel Knowledge. You combine that to do super cool things.

  2. Hello Dany,

    Thank you for the insights. An accountant has me to prepare a similar document to analyze company receivables. Very timely!

  3. This is great! I was looking for this for a while now. My questions is, what if you need the last bucket to be a “catch all” for all the outliers? So you have buckets that go up to 100 but the last bucket is anything greater than 100?
    Thanks so much.

  4. I’m trying to do the same thing except SUM instead of count records. Any chance you can give me a tip? It’s not liking anything I do!

  5. That was a great post! I ran into one issue with zero bucket results not displaying in my histogram. Found that if you add +0 to the end of the Bucket := CALCULATE formula it will force the zero bucket results to display.

Leave a Comment or Question