Today I am going to spend some time trying to convince (and demonstrate to) Excel Power Pivot users that it is easy and valuable to learn some simple SQL code. SQL is one of the easiest languages I have ever taught myself and you can do it too. The reasons SQL is easy to learn include:
- It has a very simple syntax that uses English language expressions.
- You can start with simple short queries and get more complex only when you need to.
- When you need to do something more complex, the Internet is just a quick Google away.
- The chance of getting something wrong is relatively low (for these simple queries anyhow – see warning at the end).
Why You Should Care about SQL
If you use Power Pivot for Excel and you directly source your data from a database, learning some basic SQL will help you get significantly more value from the process of loading your data. With some SQL skills, you can add significant flexibility to your data acquisition and loading processes into Power Pivot. The same opportunity exists with Power BI but with a different UI experience – I will not cover the Power BI UI in this post.
Using SQL will:
- Allow you to import a non-contiguous subset of records into your data tables that filters out unwanted data – something that is otherwise difficult to achieve with the standard UI.
- It can help you significantly improve the compression and hence performance of your larger reports.
- Allow you to import smaller lookup tables based on the subset of records in your data table(s).
- Allow you to store these loading variations in a bespoke workbook rather than in the source database.
But enough of that – let’s get into it.
The Power Pivot Import Dialogue Box
One of the first dialogue boxes you will ever see in Power Pivot for Excel is this one.
When I first saw the 2 options in this dialogue, it was very clear to me. I would ALWAYS use option 1 and I would NEVER use option 2. Now I know better and I am going to share with you why I now routinely use option 2.
Note: You can only use SQL if you have a supported database as a source. You can’t write SQL code to selectively import CSV data for example.
Changing the Import Approach of a Loaded Table
I want to start with a data model that already is loaded with data built and is up and running. The sample I am using is a typical Adventure Works report that I built some time ago for another blog post. You can Download the files I used from here if you want to play along. Make sure you unzip the files into a folder – don’t use the files from a zipped folder else the next steps may not work.
The reason I am doing this is that this is my most common use case – I have a report (typically large – maybe 200MB, and someone wants a variation).
Reconnect The Database
If you are following along with the samples, the first task is to reconnect the Power Pivot Workbook to the Access Database. To do this, go to the Power Pivot Window, select Home\Existing Connections and then select the Access connection (#1 below) and then click Edit (#2).
Then just browse to the location of the Access database (it is in the zip file with the Excel workbook), select the Access database, Click Open\Save\Close. This reconnects the Excel file with the database.
The Basic SQL Syntax
The basic SQL syntax is simply as follows
SELECT one or more columns FROM a table WHERE a set of selection rules apply
It can be a lot more complex than this, but the point is that this is enough to get started.
Use Case 1 – Import a Sub Set of a Data Table
Let’s assume that you want to refresh a workbook so it only contains data for the top 10 selling products. First create a copy of the existing workbook (the one that contains all your data) so you don’t change or corrupt your master copy. Then use the new workbook (still containing all the data) to find a list of the required product codes you want in your new workbook. In my example I want the top 10 selling products, so it is easiest to do this in a pivot table like shown below.
Once you have this list of values (product keys), you could go into the Power Pivot window, click on Design\Table Properties, and bring up the Import Wizard dialogue. From there you could manually select the product codes you need as shown below.
Now if you just want to re-import the data for 1 or 2 products, or if you need to import a contiguous range of product codes, then this is a pretty easy process. But if you want to select a non-contiguous set of data like in this example, it is a lot more work. And it gets worse the more items in the list you need to import. But this problem is easy to solve with some simple SQL. Here is how I do it.
The first step is to deselect all product codes and then select any 1 item in the list – click OK. It doesn’t matter which one as it is going to be changed anyway.
Then in the same dialogue box, switch from Table Preview to Query Editor as shown below.
You will then see the SQL code generated by the wizard (shown below).
The SQL code is not very well formatted and is hard to read, so the next thing I like to do is format the SQL code. I normally use the following website but you can use any online service you like. http://www.sql-format.com/
I copy and paste the SQL code from the Power Pivot window above and then paste it into the formatting service and then click Format. It gives me something that looks like this
I then copy and paste the code back into the dialogue box in Power Pivot, and it looks like this below.
Now the reason I selected any single product from the table preview editor earlier in the process is because this forced the table preview editor to write some more of the SQL code for me. See how it added a WHERE clause? If there is no WHERE clause, then the query will simply bring back all the records.
The last part of the code is as follows
FROM [Sales] WHERE ([ProductKey] = 214)
The next step is to replace this WHERE clause with a list of product keys. I am using an IN statement to do this as follows:
FROM [Sales] WHERE [ProductKey] IN (312, 310, 313, 314, 311, 361, 353, 363, 359, 357)
Note that I have removed some extra unnecessary brackets from the original code line 2. You don’t need to be a programmer to work out how this works. It basically says “bring back the columns I have asked for from the Sales table where the product key is in this list of values above. The reason the list of columns is so long (in this case) is because I selected a subset of columns when originally importing my data. If I selected every column, it would simply say SELECT * FROM [Sales].
To get the list of product codes, you can use the normal Excel tricks to concatenate them into a single string separated by commas. To do this, I normally get my list of values into a Pivot Table, sort the pivot based on sales (in this case given I am after the top 10 products), copy the product codes from the Pivot Table and then paste them as values somewhere in the spreadsheet.
Once you have done this, you can manually combine them into the IN statement needed using Concatenate or similar. Once I had done this a few times I figured some VBA code would be better. I wrote this VBA code (below) that allows you to select/highlight a range of cells containing the data (product numbers saved as values in this case) and then run the VBA code to create the IN statement. Note this range of values can’t be the pivot table, it has to be values in cells.
Here is the code if you want to use it (one code for text, one code for numbers).
Sub CombineCommasSQLCodeText() Dim Cell As Range, mySelection As Range Set mySelection = Selection For Each Cell In mySelection If Cell.Address <> mySelection.Address Then If Cell.Address <> mySelection.Range("A1").Address Then If Cell.Value <> "" Then mySelection.Range("A1").Value = mySelection.Range("A1").Value & "', '" & Cell.Value Cell.Clear End If End If End If Next Cell mySelection.Range("A1").Value = "in ('" & mySelection.Range("A1").Value & "')" End Sub
Sub CombineCommasSQLCode() Dim Cell As Range, mySelection As Range Set mySelection = Selection For Each Cell In mySelection If Cell.Address <> mySelection.Address Then If Cell.Address <> mySelection.Range("A1").Address Then If Cell.Value <> "" Then mySelection.Range("A1").Value = mySelection.Range("A1").Value & ", " & Cell.Value Cell.Clear End If End If End If Next Cell mySelection.Range("A1").Value = "in (" & mySelection.Range("A1").Value & ")" End Sub
After running the VBA or manually concatenating the values you will have this:
IN (312, 310, 313, 314, 311, 361, 353, 363, 359, 357)
Now all you need to do is copy the IN statement and add it to the SQL code in the Query Import dialogue. Paste it in, click validate (to make sure you haven’t made any mistakes) and then save.
The sales table is then re-imported with a sub set of records.
Use Case 2 – Reduce the Size of a Lookup Table
Now that the Sales table contains a small list of 10 products, it is not necessary to bring in every product from the products table. It is easy to fix this using the exact same approach as above – in fact you can use the exact same WHERE clause (assuming the name of the product code column is the same in both the Sales and the Products tables).
- Go to the products table
- Click on Table Properties
- Change the product key filter to select a single product
- Switch to the Query Editor
- Format the Text with an online formatter
- Copy the formatted SQL code back to Power Pivot
- Reuse the IN statement from before to limit the selected products to be the same ones in your sales table.
You will end up with this. Validate the formula and then save.
Use Case 3 – Sort data on load to improve compression.
As I covered in my last blog post, you can gain significant compression improvements if your workbooks are a lot larger than 2 million rows. To gain the benefits, you will need to sort the data on load. To do this, simply do the following.
- Go to your data table
- Switch to the Query Editor
- Format the query if you like
- add an ORDER BY clause to the end.
See the example below. Now in my test workbook there are only a couple of thousand rows, and so this wont make any difference to my workbook. However if you have millions of rows, this can make a big difference (read the previous blog I mentioned above to find out why). You will need to work out which column to sort by in your own data with some trial and error and understanding of the cardinality of your data.
Use Case 4 – Reduce the Size of an Indirect Lookup Table
This next example is not beginners SQL and also will not work with Access, but it is a good demo of what can be done. You can copy the pattern I show below or invest some time learning some more complex SQL. Keep in mind that I am an Excel user with no formal SQL training, and I can now write this code quickly and easily when needed. If I can learn it, other Excel users can learn it too. Basically what this code does is:
- Find a list of customer codes that have purchased the products in question
- Creates a temporary table of these customer codes
- Uses the temporary table to extract a list of just those customers into the customer table.
I use this pattern to retrieve a subset of dimension tables from a SQL Server DB all the time. If your data source is SQL Server you would do this as follows (same general process as before).
Go into the Customer table and copy the SQL Code (just copy what is there), format the code at an online formatter.
This time I have left it in the formatting tool to show you what I am doing. If I am doing this in real life, I typically complete this step in SSMS, but the process shown below is fine too.
Now go into the Sales Table and copy the SQL code from there too. Paste the SQL code from the sales table and paste it before the SQL from the Customer Table into the editor below. You will have something like this.
Then I edit the first SQL statement (Sales) so that all it does is create a list of Customer Numbers that have purchased this product (shown below).
Then I turn this code snippet into a temporary table that I can reuse in my query. There are various ways of doing this, but here is what I do (shown below). This gives me a temporary table called ‘temp’ that contains all the customer numbers I need in my Power Pivot Workbook.
Then I change the second part of the SQL statement to just extract those customers as follows.
Copy the code back to the customer table in Power Pivot, validate and save.
Consolidation Case – Bespoke Workbooks
All of these examples have 1 thing in common. The code to extract a different set of data is saved inside the Query Editor in a duplicate copy of a Power Pivot Workbook. If you need the same query over and over, then it would be much better to ask someone to create a query/view for you in the database itself. However for those times that you want to build an ad hoc report, I think it is much better to store your queries in the workbook itself. It is accessible to you (the report author) and doesn’t pollute the source database with one off queries that will rarely be reused. So the end state of the steps outlined above is to create this bespoke workbook that contains all the information for the specific bespoke need without polluting your SQL Server DB.
A Warning to Finish
One thing to be aware of – you can do bad things to your database server if you write bad SQL. Literally you can bring the SQL DB to a standstill. If you do this, it is possible someone who administers the server will come after you. Worst case they may ban you from refreshing your workbooks against the server. So I suggest you start out simple and make sure you know what you are doing before you get too tricky.
I would love to hear from others about what other SQL tricks they use with Power Pivot. I have thought about creating parameter tables to store the list of values and then use Power Query to manage it all, but I am not an expert on Query Folding so I have not gone down this path (yet anyway).