Hello P3 World! My name is Krissy Dyess and I have had the most amazing experience working as a full-time consultant at Power Pivot Pro since April 17, 2017!

Today is 1/9/2018 so that would be…

266 calendar days, if you DON’T include today’s date

267 calendar days, if you DO include today’s date

182 calendar days, if you include ONLY business days, excluding Saturdays/Sundays & 6 holidays

… and there are many other FLAVORS of responses that could be listed here as well!

Funny, how a simple intro can branch off into many different paths of interpretations – this is pretty much the world of data and analysis that we live in as data professionals!

For any given question, there can be MANY different ANSWERS, depending on what question you are REALLY trying to answer. Additionally, there can be MANY different SOLUTIONS!

In my 20+ years working with data, it seems like a good deal of my time is spent PREPARING the data. What does this mean? It means applying logic to structure the data in a way that is best for your data model and/or analysis.

Today, I would like to share a common data problem and 2 different SOLUTIONS used to PREPARE a set of data for Power BI.

And like most problems, there is usually more than just one or two ways to create a solution. So, if any of our readers out there would like to share any additional ways to solve this problem, please feel free to leave your ideas in the comments below!

The Problem

Sometimes it’s necessary to create a data set that contains a consecutive list of dates and a value for a data point that will be measured or visualized. In this example, the Source Data shown has 7 records, starting with 07/31/2017 and goes through 08/10/2017. Notice that there is a gap between 08/05/2017 to 08/10/2017.

Source Data and End Goal

Source Data and End Goal

The end goal is to have consecutive days. Now, the end goal could be consecutive days through the last date available (08/10/2017) or through today’s date or etc. In my example, I will go through today’s date but depending on your problem and your end goal this could be different.

SOLUTION #1 – Using SQL

Why solution in SQL? Well… because until Power Query came around, SQL was my best friend when I needed to do any data manipulation!

For those of you looking to learn SQL, this will be a good example to follow along with. For those of you that don’t care to learn SQL, I’ll also show another solution using Power Query.

I’m using SQL Server Management Studio (SSMS), my local workspace, and then creating a New Query workspace.

New Query

New Query

In order to create a table and populate that table with example data, I select New Query and use the following scripts:

SQL Step 1

SQL Step 1

SQL Step 2

SQL Step 2

SQL Step 2 View

The SQL Script is entered in the New Query window, Execute or F5 is used to run the SQL script and the results are displayed in the Results Pane.

image

Executed Query

Voila! I’ve created a table with example data by running the above code!

Source Data

Source Data

And we can see the 4 day gap between 08/05/2017 to 08/10/2017.

The next step would be to have a complete consecutive set of calendar days available in a table. Now, you might have a calendar table that you can already use for this step or you might not! There are many ways to create a calendar table! But for this example, I’m going to show a different technique to get a set of consecutive calendar dates for the time frame I’m interested in building.

SQL Step 3

SQL Step 3

 

SQL Step 3 View

Voila! I’ve created a consecutive calendar table with example data by running the above code!

NOTE:  If you are following along, your dates will be different based on when you run the code – this doesn’t matter for understanding the concept.  The purpose is to just get a set of dates, they can be any dates.

Consecutive Calendar Table

Consecutive Calendar Table

SQL Step 4

SQL Step 4

 

SQL Step 4 View

We can see that we have 161 rows of data starting 07/31/2017 going out until 01/07/2018.

Join Calendar

Join Calendar

SQL Step 5 – Final Step

Which brings us to the final step, which is to join the result of the above step with the actual store quantity data to get the actual quantity for days where a quantity was available. And to do this, I created the final SQL query:

SQL Step 5

Step 5 View

Final Join

Final Join

And now we have the final SQL query & result set –

image

Final Results

Final Results

SOLUTION #2 – Using Power Query

Now, can we get to the same results using Power Query? Yes, we sure can!

Power Query Step 1

Let’s start with the example data. Since I started with SQL, I can just copy and paste my same example data from SQL into Excel. Then in Excel, I can use Insert > Table and to create a table named ‘StoreQuantity’. So far, so good!

Source Data

Source Data

Power Query Step 2

Then since I’m using Excel 2016, I can use Data > From Table/Range to load and launch the Power Query Editor window with the example data.

From Table/Range

From Table/Range

Power Query Step 3

 

The exact same example source data is now loaded into the Power Query Editor! Voila!

Power Query Editor

Power Query Editor

Power Query Step 4

I’ve renamed the source data query to ‘Staging_StoreQuantity’. And I’ve created a “Staging” folder for the data preprocessing steps.

Renamed Query

Renamed Query

Power Query Step 5

For the next step, I can right click on ‘Staging_StoreQuantity’ and Reference or Duplicate another starting source of data.

Now, there are performance reasons that I might choose Reference over Duplicate.

If I were to choose Duplicate, Power Query will locate and retrieve the data again. Depending on the source of the data and the size of the data, this could take some time. However, if I choose Reference, then I’m using the data that is already loaded in the Power Query Editor. So, in this case, I’m choosing the Reference option!

Reference Source Data

Reference Source Data

Power Query Step 6

I’m going to rename the new query to ‘Staging_StoreQuantityAllDays’. And use the Home > Group By button to create a unique list by [store_id] of the first [create_date] where we have a quantity.

Group By

Group By

Power Query Step 7

Next, I can add a custom column expression using Add Column > Custom Column and the Power Query List.Dates function to create a list of consecutive dates for each [store_id]. I will use the [min_create_date] as the starting date in the expression and go out 365 days.

List.Dates(#date(Date.Year([min_create_date]), Date.Month([min_create_date]), Date.Day([min_create_date])), 365, #duration(1, 0, 0, 0))

Custom Column

Custom Column

After adding the [dt] custom column as a List, you can use the Expand to New Rows…

Expand to New Rows

Expand to New Rows

… to get one row for each item in the List. Boom!

Distinct List

Distinct List

Just a few more steps to change the [dt] format to a date, filter steps to keep only the records through the current date … and Boom!

We now have a table with consecutive dates for the [store_id] starting with 07/31/2017 through to today!

Consecutive Date Table

Consecutive Date Table

Power Query Step 8

 

Finally, I can use the Home > Merge Queries as New to create the final query.

Merge Query

Merge Query

Staging_StoreQuantity is the resulting column, we use the Expand double arrows to select ONLY the [qty] column from our original data source.  I deselect the use original column as prefix so that only the [qty] name is returned for the column versus Staging_StoreQuantity.qty.

Expand Double Arrows

Expand Double Arrows

I renamed the final query to ‘StoreQuantity’ and I’ve created and named a new folder called “Data Model”, to store the final query that will be loaded to the data model and that’s it!

Final Query

Final Query

You can see that we can use either SQL or Power Query to take source data and manipulate the data into the SAME desired results!

Which method is BETTER? Well, that depends. If you don’t have SQL or prefer not to use SQL, then Power Query is a GREAT option!

As for me, I have to refer to the song… “Make New Friends (Power Query) But Keep The Old (SQL)… One Is Silver And The Other Gold!”

Happy New Year Everyone Smile

Please see the attached Excel file for the Power Query solution and the attached SQL script file for the SQL solution.

Forget bending spoons with your mind – there’s no money in it.

It takes a special kind of mindset to “bend” data (and software!) to the human will.  As this article demonstrates, we at PowerPivotPro can twist Power BI into a pretzel if that’s what an organization needs. (A robust, trustworthy, industrial-strength pretzel of course).

The data-oriented challenges http://foofacing your business require BOTH a nimble toolset like Power BI AND a nimble mindset to go with it. And as Val Kilmer / Doc Holladay once said, we’re your huckleberry.

Connect with the Experts

  Subscribe to PowerPivotPro!
X

Subscribe

Krissy Dyess

Prior to joining PowerPivotPro, Krissy spent 16+ years at a Sales BPO company and held roles in Reporting, Analytics, Modeling, Strategy, Management, and Consulting teams. She primarily used data from MS SQL Server, SalesForce.com, Tableau, Cisco, and Avaya for clients including: Facebook, Google, MS Bing, Skype, AT&T, T-Mobile, Sprint, IBM, Lenovo, Toshiba, FedEx, TIAA CREF, and Nissan. Krissy is a SQL data ninja queen but decided to explore learning DAX and Power BI in her quest to clean and wrangle “dirty” data from disparate client systems and to put that data into the hands of the Business Analysts. Her passion is helping others discover the full potential that Power BI & PowerPivotPro can provide. 

This Post Has 14 Comments

    1. Hi Kathleen! Thanks for reading and taking time to provide your feedback! My goal was to bring awareness of how SQL and Power Query could be used to achieve the same end goal. Mission accomplished 😉

  1. List.Dates([min_create_date], 365, #duration(1, 0, 0, 0))
    a little bit shorter 🙂
    Or you can add [max_create_date] on grouping and create this:
    List.Dates([min_create_date], Number.From([max_create_date] – [min_create_date]) + 1, #duration(1, 0, 0, 0))

    1. Hi Maxim! Yes, I like the shorter version that you’ve shared MUCH better: List.Dates([min_create_date], 365, #duration(1, 0, 0, 0))

      And it would probably be a good idea to get both the MIN and MAX on the Group By step – this would tell you start date and end date for each [store_id] and that will be useful to have as well!

      If anyone reading is new to Power Query, you will need to select the “Advanced” radio button option and then “Add aggregation” button in order to add a “New column name” entry and to have both the MIN and MAX columns added in the Group By step!

      And then, as you’ve shared, you can use your 2nd Power Query M code in a Custom Column, to create a list of dates for each [store_id] starting from the [store_id] start date through to the [store_id] end date only.

      Thank you for taking time to contribute your approach!

  2. We can also set data type on [min_create_date] as Number or Decimal and then create a list of just like this:
    {[min_create_date]..[min_create_date]+365}
    without List.Dates function 🙂
    After expanding we can set data type back to Date

    Great tip on the benefits of Reference over Duplicate – Thanks!

    1. Yes, this also works! Thank you for sharing your idea 🙂

      It’s great to see these M code examples and it shows that there really are MANY ways! In some cases, one approach to a problem could be better than another. In some cases, it’s just a matter of preferences.

      The key is that there is always a way!

      Thanks again!

  3. How about this as a simple dates generating SQL query:

    ;with cte as
    (
    select cast(‘20180109’ as date) as datecol
    union all
    select dateadd(day,1,datecol) as datecol
    from cte
    where dateadd(day,1,datecol) <= '20180131'
    )
    select datecol
    from cte
    order by datecol
    option (maxrecursion 0)

    1. Yes, this totally works! Thanks for sharing your SQL wizardry!

      Then to match the temporary table called #Dates in the post example, just make the following changes:

      with cte as
      (
      select cast(‘20170101’ as date) as [dt]
      union all
      select dateadd(day,1,[dt]) as [dt]
      from cte
      where dateadd(day,1,[dt]) <= getdate()
      )
      select
      cast('A' as nvarchar(1)) as [join], — Will be used to join
      [dt],
      row_number() over (order by [dt]) as [dt_order]
      into #Dates
      from cte
      order by [dt]
      option (maxrecursion 0)

  4. Thanks for sharing Krissy. If you had access to both SQL and Power Query would you typically prefer one or the other? Are there certain processes that work better in one over the other?

    1. Hi Ethan!

      Great question! My answer is… it depends 😉

      If I am limited to using only VIEWs in SQL, I would need to be cautious of performance, which would depend on the complexity of the data preparation needed. In this case, Power Query might be the better choice.

      If my data “lives” mostly in SQL, I’ve had great results creating a scheduled Stored Procedure to “build”, “clean” and “prep” my data set(s) prior to using Power Query to load the data into my data model. Generally, a SQL physical table will perform much faster than a SQL VIEW when loading into Power Query.

      Additionally, if your SQL Server has good performance then it makes the most sense to do as much data preparation in SQL – to leverage the power of SQL Server.

Leave a Comment or Question