So How Many Servers Do I Need in My PowerPivot SharePoint Farm?
As people increasingly move up from just dabbling with the addin, and decide to start leveraging the publish/schedule/share/secure benefits of the PowerPivot for SharePoint infrastructure, I am getting this question more frequently.
How many servers? How much RAM? And less often but just as important… how many CPU’s?
Time for an old joke: “Ever hear the one about the statistician who drowned in a river that was, on average, only 3 inches deep?”
That’s the whole joke. And as time goes on, I only find it funnier. And funnier. But first, some basics.
The Three Primary Server Roles
For the vast majority of PowerPivot deployments, you will mostly need to concern yourself with three server types: Excel Services, PowerPivot Engine, and Data Source DB Servers:
A few notes on the diagram up front:
- I recommend viewing the larger version – there is information on the diagram (in the notes) that I will not repeat below.
- Don’t read too much into the 3/4/2 ratio of Excel Services/PowerPivot/SQL – I included different numbers of servers in each tier intentionally, to illustrate that you can scale each tier out at independently. But, for instance, I do NOT expect you will need 3 Excel Services boxes for every PowerPivot box (you will likely need less).
- Although separated above, server roles CAN be combined onto single servers – for instance, many folks can get by with an “all in one” server where everything above lives on a single box. Also, even in a multi-server farm, one of the MS-recommended configs is to combine Excel Services and PowerPivot onto a single box, and then deploy as many of those combined boxes as needed.
- There are some elements missing from the diagram – for instance, there is a Web Front End (WFE) role, and a SharePoint Configuration Database role, but in my experience, PowerPivot does not put unique strain on those elements (except for config db disk space). Your mileage may vary of course.
Early Planning Efforts
Back in the Spring, I set down to the very serious task of “how much hardware will we need on our SharePoint servers in order to handle user loads?” And naturally, I made a very serious spreadsheet to model it out:
Wow, what a spreadsheet. I mean, I even used the =POISSON.DIST() & =BINOM.DIST() functions – clearly, such spreadsheet horsepower indicates accurate results!
Turns out, that spreadsheet was nothing more than an amusing theoretical exercise with little bearing on the real world.
There were multiple problems with that spreadsheet:
- It relied on outright guesses as to how often/how intensely consumers would utilize the published reports.
- I was not yet aware of the CPU-gobbling power of slicers, a power that makes real-world queries many times more CPU-intensive.
- Most importantly, it assumed peak usage would occur in the morning, when consumers are most in need of fresh information.
Solution for 1) and 2)
These are in some sense the simplest to address. Quite simply, put up a pilot solution and observe the usage characteristics. You will learn a lot about usage patterns as well as what that does to CPU and RAM.
But even better: you may choose to simply ignore these factors for now and focus on problem 3 instead. Here is why:
Peak Load is Probably Experienced During Scheduled Refresh
If you’ve looked at the larger version of the diagram above, you have seen a hint as to what I’m about to say here:
At Pivotstream, our nightly refresh process puts far more strain on our servers than our users do, and is what we now plan our hardware around.
I will explain further, so you can evaluate whether your situation will be similar. We receive new data on a nightly basis (typically late at night). That data gets ingested into SQL server, and then our PowerPivot refresh process begins.
Once PowerPivot refresh begins, we have about 6 hours, tops, to get all models and reports refreshed, so that when business opens in the morning, everyone has access to fresh insights. 6 hours sounds like a lot… until you discover that it isn’t.
A lot happens during scheduled refresh!
Keep in mind that the PowerPivot model (the embedded database living inside your PowerPivot workbooks) must first be refreshed – let’s call this phase “Model Refresh.” This can put a lot of strain on your SQL servers just in terms of supplying the data, since PowerPivot v1 pulls a fresh copy of all tables (no incremental refresh).
As the data flows into the PowerPivot servers, a LOT of CPU power goes into compressing that data into the storage format. During the refresh process, RAM usage steadily climbs as well.
As model refresh nears completion, RAM usage spikes upward quite a bit, sometimes by as much as 50-100% the size of the resulting workbook. CPU usage also spikes during this phase.
Once the model is done refreshing, PowerPivot for SharePoint then triggers a refresh (via Excel Services) of the Pivots and Cube Formulas in the workbook, so that when thumbnail screenshots are “snapshotted” for the report gallery, those reflect the latest data (let’s call this phase “Pivot Refresh”).
Yes, that process initiates via Excel Services, but that refresh immediately results in a meaty set of queries sent back to the PowerPivot servers for processing. So, CPU usage spikes again. And the model in question is held in RAM while that happens, so that RAM can’t be recycled into the pool to be used for model refresh.
Remember, this all happens for a single workbook. And if you hang a bunch of report-only workbooks off of a single “hub” model workbook (as we do all the time), the amount of time it takes to finish Pivot Refresh can actually exceed the time for Model Refresh.
If you have to cram all of this into a narrow nightly window, chances are that you will need more hardware to pull it off than you ever will need for normal daily usage!
Rough Guidelines for Your Own Situation
This post is running a bit long, so I will try to be succinct in closing:
- If a significant percentage of your PowerPivot models and reports will be refreshed nightly, refresh is likely going to be a larger peak strain than normal interactive usage
- Even if you only refresh weekly (or less often), if the execution window between “data is ready” and “reports must be ready” is basically still a single night, then refresh is still likely going to drive your peak hardware need.
- If you think “Peak Load = Refresh Process” is likely the case for your organization, I suggest ignoring interactive usage projections during your hardware planning process, and instead developing some prototype workbooks (with rich data sets and mutliple report sheets with 100% realistic slicer sets, measure complexity, etc.), and then putting those workbooks through the refresh process to get a baseline.