Data IS Life…and “Life, Uh, Finds A Way.”

P3 friends…for those of you in the know (and for those who aren’t), Power Query (or PQ for short) is one of the cornerstones of the Microsoft BI Suite. It’s a tool built into Excel 2016 and Power BI Desktop that allows us to extract our data (78 data connectors and GROWING), transform our data (it’s more than meets the eye), and finally load our data into a Data Model. Wait…so PQ extracts, transforms, and loads data? Those words spell out ETL as an initialism!

DNA Data Gene

Anyone familiar with SQL Server Analysis Services (SSAS) or SQL Server Integration Services (SSIS) might have caught that. I carefully chose to describe PQ that way because in many many ways it reproduces what was historically done with these other tools. While PQ is not a complete replacement for these tools, it does allow you to transform or clean your data for nearly 99% of the data scenarios you encounter.

For a more in-depth post about what Power Query is and when to use it I’d recommend reading this week’s post by Rob. It’s called M/Power Query “Sets Up” DAX, so Learn DAX (and Modeling) First and it does a fantastic job of breaking down when and where transformations should be applied, and when to use DAX. Matt Allington also wrote up a great technical post titled Power Query as an SSIS Alternative that gives a great (and DEEP!) dive into some great ways to leverage Power Query in conjunction with Power Update. Between these two posts you’ll get a pretty good grasp of how much of a heavy hitter Power Query can be!

Knowledge is a Terrible Thing to Waste

Now that we have a basic grasp of what Power Query is, I’d like to impart some knowledge to you. Specifically I’d love to introduce you to a new post series called Power Query (M)agic. The goal of this series is to share some of the wonderful best practices that we at P3 have acquired over the years. These practices have been either been self taught, or graciously passed down from other BI Ninjas. These are practices that can be implemented in nearly any Excel or Power BI Report. The best part is MOST of these practices won’t require you to be a Power Query expert, or even an advanced user!

As long as you have a BASIC UNDERSTANDING of Power Query you’ll be able to implement these yourself. With that being said let’s dive in to today’s technique shall we?

Technique #1 – One Data Source To Rule Them All
One Source to Rule Them All

So I LOVED discovering this technique! I’ve seen the Reference button in Power Query for a while…but honestly I never took the time to figure out how to fully utilize it. Avi Singh is owed credit for showing me how to fully use this feature. So the idea behind this Reference technique is to create a BASE QUERY that acts as the singular data source for that connection to all other queries.

Hold on Reid, don’t ALL PQ queries already have a source?? Well YES, technically whenever you create a new PQ query, the first step shown IS called Source, and it does point to your data. However, let’s imagine a scenario where you’ve built out a report that has 20+ PQ queries in it, EACH with their own INDIVIDUAL data source. Now if multiple PQ queries point to a data source that changed location or got updated…you’d have to update every single query that references it. Not ideal in my book!

This is where the Reference feature comes in. Reference let’s you create new query, referencing the original query as the data source in any number of additional queries. So if the data connection ever needs updated, now you only have to update ONE QUERY! Have I lost any of you yet? Don’t worry, I’ll paint you a picture with words (well…screenshots) to help explain this better.

Traditional list of PQ queries, each with it’s own separate data source:

Power BI queries without single data connection

New PQ source query, referenced by SEVEN other queries:

Power BI queries with single data connection

Does that first image of queries look familiar? This is a typical setup in most report models…just a long list of queries. The problem (as stated earlier) is that they all have individual sources, many of them pointing to the same connection or location! The solution and output will look something like the second image; a single source query that is referenced by any other query that requires that shared connection. It’s relatively straight forward to implement, so “come with me if you want to learn” and I’ll explain how it’s done. Smile

Step 1 – Create a new PQ query to ANY table in the database you’re wanting to reference:

Power BI query create data connection

Step 2 – DELETE the navigation step, and rename your query. Now you have a base connection to any table in the database:

Power BI query finalize data connection

Step 3 – Right click the query and select Reference to create a new query with this as the data source:

Power BI query reference query

Step 4 – Navigate to the DB connection the table you want, and rename the query (DONE!):

Power BI query re-create connection

Simple right, and NOT a single line of code required! This step can be repeated for as many tables or views that you have in that database. But wait…what if there’s a report that’s already been built, and we want to update existing PQ queries? Well there’s a solution for this too! Although for this we will technically we will have to write some M code, it’ll be painless though I promise. All it will require is a single line of code to be written, that will override the Source code of your existing query.

Go to the source step of the existing PQ query you wish to update, and replace the SQL Connection with the connection query name:

Power BI query rename existing connection

When typing in the name, just make sure to include a pound symbol and quote (#”) before the PQ query name, and a quote () following it. If you’re ever having trouble figuring out how to write this Source step, you can create a new Reference Query and copy the code from there, then delete that reference query. Honestly one of the great things about this function is how similar the steps are for doing this with any data source. The majority of the steps I’m outlining here are transferrable! Well that’s it for today P3 Nation, stay tuned for the next part in this series so until then.

  Subscribe to PowerPivotPro!
X

Subscribe

Reid Havens

Reid Havens is a Principal Consultant for PowerPivotPro and the Owner of Havens Consulting Inc. His main goal is to collaborate with individuals and organizations by helping them analyze data to understand their business. Using his formal backgrounds in technology and organizational management Reid has worked with various local and multi-national companies. He is also an adjunct professor at Bellevue College, guest lectures at the University of Washington, and provides corporate trainings centered around teaching Business Intelligence, Reporting, & Design. 

This Post Has 36 Comments

  1. Reid, when I tried this, I couldn’t get it to work. Here are the steps I performed:

    1. Opened Power BI, clicked on Get Data
    2. Selected SQL Server Database as the source type
    3. Connected to a local copy of AdventureWorks2012
    4. Selected Import as the Data Connectivity Mode
    5. In the Window, selected the Sales.SalesOrderHeader table and clicked Load
    6. In Power BI, clicked on the Edit Queries button in the ribbon
    7. In the Query Editor window, deleted the Navigation step
    8. At this point, the source value in the function window read: = Sql,Database(“”, “AdventureWorks2012”)
    9. I changed this value to: = Sql,Database(“OPSQUERY”)
    10. When I hit enter, I got this message:

    Expression.Error: 1 arguments were passed to function which expects between 2 and 3.
    Details:
    Pattern=
    Arguments=List

    11. I tried changing the value to = Sql,Database(“”, “OPSQUERY”), which I did not expect to work, and sure enough, this time I got the following message:

    We couldn’t authenticate with the credentials provided. Please try again.

    Obviously, the second argument in the Source formula, at least using this approach, needs to reference the actual database name.

    Can you let me know what I need to do differently to get this to work?

    Thanks.

    1. And an edit to steps 8 and 11 – I put the term SERVERNAME in LT & GT signs to try and signify it as a placeholder, and apparently those are read as HTML or some other markup. So step 8 should correctly read:

      At this point, the source value in the function window read: = Sql,Database(“SERVERNAME”, “AdventureWorks2012”)

      And step 11 should correctly read:

      I tried changing the value to = Sql,Database(“SERVERNAME”, “OPSQUERY”), which I did not expect to work, and sure enough, this time I got the following message:

      1. Hi There, happy to help. So OPSQUERY is the name of my server, you’d want to put your server name in there. For the source query you’re creating, you shouldn’t need to edit that source step. Simply create the new query connected to a table on the server you want. Then deleting all the steps until ONLY the source step is left (unmodified). Then you right click and reference that step to attach other queries to that.

        1. Thanks Reid. To be honest, I wasn’t quite sure what you meant by, “create the new query connected to a table on the server you want,” so what I did was right clicked on one of the queries and selected duplicate, then removed the Navigation step from the duplicate query, selected it as a reference, and completed the rest of the steps. That worked, and from there I was able to complete the other steps, and point existing queries to the “source” query.

          Very cool, thanks again.

          1. And yet another edit – I did have to leave the formula as: = Sql.Database(“SERVERNAME”, “AdventureWorks2012”), e.g., leave the database name in as well, but other than that, the process looked exactly as you documented it here.

          2. Glad you were able to figure it out! I was basically referencing steps 1 and 2 with a new query creation. But duplicating works just as well! All roads lead to home as they say. 🙂

          3. Ahhhh,,,got it! So what you’re saying is, just create a new query, the way you always would, delete the Navigation step, leave the Source step formula as-is, and then use that modified query as the reference source for any subsequent queries to that DB. That makes sense.

            Thanks again, great article.

        2. Wish you guys demo this on YouTube it’s easier to follow on a video then trying to decipher the meaning of the procedures

    2. The source should be only OPSQUERY i belive. If I understand correctly you made a new source query called OPSQUERY and you are trying to redirect an old query to this one.

  2. Hi Reid,

    Thanks for the valuable lesson.

    So if your sources _are_ very different Excel workbooks. Can you start with a query that connects to the folder containing them, and then you reference that query many times and that would do the trick? Or would this not fool Excel and it would still establish many connections in RAM and fall apart as it does always when I have many sources?

    Thanks,
    Bro

    1. That one is a little trickier. Are you attempting to create a folder query that pulls ALL workbooks from a folder and appends them. Or are there multiple files in a folder that where just want to update the folder path to that single folder, with each individual excel workbook having separate queries, and not appended?

        1. Then you could use a parameter to contain the table connection string and replace that in each of your source connections. Unfortunately I don’t have a ton of time today to write at length the process to do this. But essentially a parameter can allow you to reference a body of text (E.g. “C:\User\Desktop” ) somewhere else. So essentially in the Source Excel code you would replace the chunk of text that was originally hard coded as “C:\User\Desktop” with a concatenated reference of the parameter, which supplies the file path and then whatever the file name was.

          It would look something like below as a SOURCE folder path for an Excel file. Hope this helps steer you in the right direction. A quick google search will show you how to create the Parameter itself.

          = Folder.Files(ParameterName & “\ExcelFile.xlsx”)

    2. Hey Bro (for real, bro, is this how one shortens Ibrauheem to? That’s amazing!)
      In my practice, referencing local files is a dangerous game to play.
      A better way is to put them online, and reference online versions – it is more scalable this way.

      And if you put them into sharepoint storage (which gets generated for you every time you create a Power BI Workspace or an MS Office email group, or a MS Team team, and so on) – then you indeed can reference folders.

      Otherwise, Reid’s parameter solution is what you need.

      1. Taras, yes I definitely second your opinion. Online files are often a best practice if you CAN utilize that approach. Though grabbing data from flat files is still less optimal that having all your data in a secured database or tabular model, one can dream. 🙂

  3. Hi, i am confused. Why should I not use a parameter for the source in all queries?
    Therefore I can just update the parameter and then all queries are up to date?!

    1. If all you need is simply a folder path or SQL Connection string, yes a parameter would work. However let’s say your SOURCE query was a master table that requires about 10 Power Query steps before it then goes into 5 distinct tables used in your model. Then this source query will let you apply those steps only once before it becomes the tables in your model, opposed to recreating those 10 steps each time in each distinct query.

  4. awesome…where was this post 3 days ago! We moved from Dynamics CRM to Dynamics 365 (which ended up with slightly different odata feeds) and I had to go through 9 separate queries to update the source to the new feed. The method described here would have saved me that grief.

    Sooooo, having said this, if i go ahead and create a new source query, is it possible to remap those existing queries to reference the new source one?

  5. I use the built-in Power Query parameters plus a parameters table that resides within Excel to parameterize my calls to the database (see PQ_Parameters in the WHERE statement). This allows me to dynamically alter the queries depending on spreadsheet events. I was not able to find a way to do this with the source reference method, thought it may be possible.

    Source = Sql.Database(Server, Database,
    [Query=”SELECT txtCalendarYearMonth AS CalendarKey, Date AS Month, FiscalYear, CalendarYear, CalendarQuarter
    FROM Cdtbl_CalendarYearMonth
    WHERE FiscalYear >= ” & PQ_Parameters(“MinFY”) & ”
    AND FiscalYear <= " & PQ_Parameters("MaxFY") & " "
    ])

  6. I always used the “Edit Queries > Data source settings” function to quickly change a lot of queries, especially when all of the queries look at different tables/views in the same database. Your solution seems to assume this is the case, right? When would you use this option versus the one in your post?

    1. If you’re only wanting to provide quick changes to data source settings you can definitely use that. However, a large benefit of having a base query is that it can also contain multiple steps that need to be applied to any subsequent query that references it. I’ve had projects where there is a master table in a data base that ends up creating about 5-7 distinct tables in a model I’m building. However there can be multiple transformations that all of those tables require before splitting off into separate entities. A “source” query provides both the benefit of a single connection to update AND can hold those core steps as well. So in a way I’m future proofing this way in case there’s ever any edits I need to make to the “core” source that I don’t want to apply to multiple tables. 🙂

  7. I have experimented with the reference connection method versus using the built-in parameters in PQ and I HIGHLY recommend sticking with the built-in parameters to reference your database. Think of the built-in PQ parameters as constants. They should be used for values that do not change often (like the server name and database name). To use them, launch the editor, select the drop down from Manage Parameters => New Parameter and create a parameter named “Server” and enter your server name (MyServer). Then create a parameter named “Database” and enter your database name (MyDB). Then you can reference your parameters like this:

    Source = Sql.Database(Server, Database)

    So far the effect of this is very similar to what you describe in your article. However, if you wish to filter the source table using dynamic parameters, the reference method falls apart. The method for building and accessing dynamic parameters is described here:
    http://www.excelguru.ca/blog/2014/11/26/building-a-parameter-table-for-power-query/

    Combining the built-in PQ parameters with the dynamic Excel parameters allows us much greater control over what we are importing and ultimately reporting. For example, I have a forecast table with over 30 million records and I only want to display records from the specific forecasts that a user selects. One way to do this is import the whole table and use DAX to select the right forecast. A better way to do this is limit the table that is imported to only the records that are needed and then load those into the data model. This approach requires dynamic parameters.
    To accomplish this we can inject a parameterized SQL string as an argument into the connection like this:

    Source = Sql.Database(Server, Database,
    [Query=”SELECT ID, MonthOfService AS CalendarKey FROM Exp_history
    WHERE Cycle = ‘” & PQ_Parameters(“Cycle”) & “‘
    AND Version= ‘” & PQ_Parameters(“Version”) & “‘ “
    ])

    In the example above, the dynamic parameter table is named PQ_Parameters and “Cycle” and “Version” represent the values pulled from the dynamic parameter table.

    This approach does not appear to be possible with the reference connection method as the connection to the database has already been established preventing us from injecting our SQL string as an argument. The next thought might be to utilize the dynamic parameters to filter only the records we want as a separate step like this:

    #”Filtered Rows” = Table.SelectRows(ExpTable, each ([Cycle] = PQ_Parameters(“Cycle”)))

    Unfortunately, this does not work resulting in the error: “Formula.Firewall: Query ‘ExpTest’ (step ‘Filtered Rows’) references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.” There may be a workaround, but I have not found it yet and it is not worth pursuing given the ease of using the built-in PQ parameters. As far as I can tell there is no easy way to utilize dynamic parameters within the reference connection method. Given the ease of using the built-in PQ parameters and limitations of the reference connection method, I can only conclude that using the built-in PQ parameters is the best practice for establishing a connection to a database.

  8. Hi Jeff,

    Appreciate the additional information, I’m sure some people will find use in parameters as well. While Parameters can also be used to maintain a single connection string, that dives more into the intermediate level of Power Query knowledge and my main goal is to present methods of optimizing models with no to minimal (copy/paste) M writing.

    I think using parameters is neither better or worse foe establishing a connection to a database. A large benefit of having a base query is that it can also contain multiple steps that need to be applied to any subsequent query that references it. I’ve had projects where there is a master table in a data base that ends up creating about 5-7 distinct tables in a model I’m building. However there can be multiple transformations that all of those tables require before splitting off into separate entities. A “source” query provides both the benefit of a single connection to update AND can hold those core steps as well.

    You can also still have the best of both worlds and use a parameter to provide the connection string in the source power query as well.

    1. Hi Reid,

      Thanks for the article and reply. It has provoked me to dig deeply into this issue.

      In general, the connection to the database should be as generic as possible (server name, database name) to allow for customized data pulls. When using the reference connection method, the Sql.Database function has already been invoked so we have moved past a golden opportunity to customize the data pull.

      The base query should NOT have a query inserted (unless you plan on running the same query against every table).

      As for the repeatability of steps, that can easily be accomplished by saving your steps as a function and then calling the function as needed resulting in queries with only a few steps.

      For these reasons, I stand by my assessment that using the built in PQ parameters is the BEST way to establish and maintain a connection to a database though I appreciate your approach and your willingness to share your ideas. Best wishes.

  9. I too have seen the Reference button in Power Query for a while… and just didn’t know what to make of it or how to use it. Thanks so much! This will be a part of my models going forward.

    Could also be used in wide data sets that you are trying to make palatable to the Power Pivot engine. For Example; Load the raw dataset that includes customer product and sales, delete the first 5 rows, split colums, rename columns, etc. Then use the Reference button as the starting point to make three diffrent queries, Customer, Product, and Sales.

Leave a Comment or Question