Turn a Folder of Exported CSV or TXT or even Excel files into a single, easy-to-analyze table, in an eyeblink.

Imagine Being Able to Just Collect Exported CSV/TXT/Excel Files in One Folder,
Name them Properly/Consistently, and Magically Combine them Into a Single Smart Table

Intro from Rob

You know my old joke, that “Export to Excel” is the third-most-common button in data applications behind OK and Cancel?  Well it’s not really a joke – the world RUNS on export to Excel, and in most cases, the export button actually produces a text file – a .CSV (Comma Separated Values) or .TXT (Text – typically tab-delimited).

Dealing with those exported files has been VERY labor-intensive in the past, but there’s great news:  the newer tools from Microsoft are AMAZING at handling these exported files and making terrific sense of them – in an eyeblink.

Dealing with Exported CSV Data Used to Be Hard. But now, it is Actually FUN! Seriously.Seriously, if you’re toiling away today, doing ANY of the following:

  1. Creating a new version of a spreadsheet EVERY time you export a CSV
  2. Struggling with data being treated as text in Excel when you need it to be a number
  3. Lacking any ability to trend/compare the exported data over time
  4. Just generally spending much more time getting the right data connected with the right analyses

 

…then you need to know that your life can be much better, TODAY, just by using the newest capabilities from Microsoft.

Gil is here to help.  Take it away, sir…

Already Good, Just Got Better

The Power BI team has recently released an enhanced “combine binaries” experience as part of November 2016 update to Power BI Desktop. (Jargon Alert:  “Combine Binaries” is a scary term.  Instead it should be named “Magically combine multiple files together into one table and make me SUPER happy.”)  The improved experience can drastically help you to import multiple Excel or other files from a folder and avoid writing advanced query functions. But today we will focus on a specific scenario, which is so common that it deserves this special post – Handling CSV files.

In fact, today’s blog post is actually the first post in “The CSV Series”. I hope you will enjoy it. To celebrate the November update of Power BI Desktop, we will review the improved experience, and will walk you through one of the most common scenarios that is now so easy to implement – Importing multiple CSV files from a folder, including parts of their filenames.

NOTE:  for now this “enhanced” capability is only available in Power BI, but Microsoft has been pretty consistent about bringing these benefits to Excel.  Stay tuned for my next post in this series, to learn how you can do it in Excel, who lacks the Power BI improvement (at least for the time being. Hopefully the improvement will find its way to Excel soon).

It all starts with a somewhat-hidden yet magical button – You can see it highlighted in the following screenshot:

The magic button to append CSV files from a folder - Now includes the filenames in the appended table

The Button Looks Like an Elevator Going Down
(But Really It’s “Magically combine multiple files together into one table and make me SUPER Happy”)

 

Prior to November update, clicking the magical button (AKA combine binaries) was only effective when you had a folder of CSV files. All CSV files were appended into a single file. There were only two caveats:

1- It only worked on CSV files (If you had other file types, such as Excel workbooks, you had to create a query function to process a single file and then invoke the function on all files. This is now done automatically for you in the November update of Power BI).

2- If your CSV files contained crucial meta data in the filename, you “lost” that data after clicking the combine binaries button.

For example: In the screenshot above you can see that we have the cities information in column Name. Clicking the combine binaries will append our sales data from each city, but we will no longer be able to associate the data to the city, if the city is not mentioned in the CSV files. This limitation was so common, and for many users, who weren’t familiar with query functions or were reluctant to manipulate the Power Query formulas, this magical button had become useless.

So today, with the new functionality in Power BI Desktop, you can append all CSV files together and keep their filenames in the appended results. Let’s review the new experience on a folder of CSV files.

Open Power BI Desktop and click Get Data icon.

Combine multiple CSV files from a folder using Power BI

Select File and then select Folder and click Connect.

Combine multiple CSV files from a folder using Power BI

Click Browse to browse to a folder with CSV files. Select the relevant folder and click OK. Click OK again in the Folder dialog.

Select the folder of CSV files to combine in Power BI

In an extra-probably-unnecessary step will will see the content of the selected folder. Click OK.

Review the content of the folder with CSV files before we combine the files in Power BI

In the Query Editor you will notice the small icon in the header of the first column. This is our Combine Binaries button. Click on it, and the magic will happen.

Click the Combine Binaries button to append all CSV files from a folder in Power BI

With this update of Power BI Desktop, a new step is introduced after you click the Combine Binaries. You will see a preview of the first file (For example, with Excel workbooks, you will get to see a navigator window that will allow you to choose which worksheet or table to load). For our CSV scenario, you will notice the CSV dialog.

In this dialog you will see a preview of the data from the first CSV file. Under the hoods, Power BI will automatic detect which delimiter to use, and may even promote the first row as headers. You can manually change the delimiter, or define how Power BI should handle data types. You can set it to automatically detect data types based on first 200 rows, or the entire dataset or you can even opt out the detection of data types.

Review and set the delimiter before all CSV files from a folder are combined in Power BI

The results are awesome – We can now import and combine all CSV files from a folder, and include their filename in the appended table.

You can see that the filename is now located in the new column Source.Name.

Power BI Desktop appends all CSV files from a folder, including the filename

You may notice that Power BI Desktop creates new artifacts on the Queries pane. If you don’t consider yourself an advanced Power Query user, I suggest that you just ignore those new “strange creatures”. Imagine that that they don’t exist.

But if you insist on scrutinizing your Queries pane,  follow my blog post here to learn more about these artifacts.

Turning File Names into Meaningful Columns!

Sometimes, your exported data does NOT tell you “where” it came from.  For example, in our data here, each CSV was created by filtering some other application by city and then pressing Export – but the resulting report/export did NOT include the City as a column!  When we saved the CSV, we re-named it “London.csv” etc. so that at least the filename “remembers” where we got it from.  (Another common example is the same exact export run over different timeframes rather than different locations – one for January, one for February, etc. – in which case the technique below will also be 100% applicable).

So our next step is to extract the city from column Source.Name. It’s pretty simple.

Select the column Source.Name, and right click on its header, then select Replace Values.

 

Remove .csv from filename to extract the region data from the filename

In the Replace Values dialog, set .csv as Value To Find, and click OK. This step will remove the extension (*.csv) from all the values in column Source.Name, leaving the cities intact.

Remove .csv from filename to extract the region data from the filename

Now that we have extracted the cities from our filenames, we can rename the column to Region (Renaming is simple. Same way you rename a filename in a folder).

Renaming the column Source.Name to Region as part of the import of CSV files from a folder in Power BI

To prove that we were able to combine all the files, and have filenames as regions, let’s sort our data by Product. To sort the data by Product, click the sort button. It’s similar to sorting tables in Excel.

Do you see all the cities in column Region? Isn’t it awesome?

Filename is now included as a region as part of the import of CSV files from a folder in Power BI

Before we close the Query Editor, it’s important to ensure that our numeric/date columns have the right data type. In our example, we can change column Sales to Decimal Number, and column Date to Date.

All regions are included as part of the import of CSV files from a folder in Power BI

That’s it. Clicking Close & Apply in Home tab, will load all our CSV files and their filenames from the folder. Here is an example for a simple report that I’ve built.

Here is our report on Power BI - All regions are included as part of the import of CSV files from a folder in Power BI

Now, for the sake of completeness and awesomeness, let’s check out what happen to our report, if we add new data for Chicago region by adding a new CSV file Chicago.csv to our folder.

If we add a new CSV file in the folder, its data will be included as part of the import of CSV files from a folder in Power BI

Clicking Refresh on the report will immediately append the data from the new CSV file along with the other CSV files in our folder, and include region Chicago in our report. And this is real magic.

Conclusions

Following this blog post, I hope you are encouraged to download the latest Power BI Desktop and import multiple CSV files from a folder. You can now easily include the CSV filenames to refine the appended table with supplementary information.

I hope that Microsoft will soon update Excel with this improved experience (they usually do).

Stay tuned for my next post in “The CSV Series”.

  Subscribe to PowerPivotPro!
X

Subscribe

Gil Raviv

Principal Consultant at PowerPivotPro and a blogger at DataChant.com. Former Senior Program Manager on Microsoft Excel team. Led Power Query integration in Excel 2016. 

This Post Has 20 Comments

    1. Thank you Wyn,
      I agree with the issue you reported. That is why I recommended to explicitly change the column types after the append. Honestly, even before this feature, I don’t recall a single append that kept the types of the appended tables

  1. ~SQUEALING ON THE INSIDE~
    Now the combine button works on not just CSV, and to be sure i tested using the add-in version of Power Query (not PowerBI).
    Prior I was using some custom queries as a workaround.
    Just WOW!! The pace of value adding feature being added is getting hard to believe.

  2. Excellent feature that I would have used many times in my career. The workarounds were always very clunky. Thanks for the good explanation– I’ll pull this out next time we want to read in multiple files.

    1. Thank you Nick
      Yes, it is possible to append multiple worksheets into a single table, and there is also a very easy way to do it (yet it is unnoticeable). I plan to write about it soon. Till then here is a clue: Import the Excel file from Power Query in a new Excel file/Power BI, and click the root node in the Navigator window (Don’t select any specific worksheet or table).

  3. Thanks, Nick. Nice column…I’ve got a system that puts out a new .csv each month (e.g., Sep.csv, Oct.csv, Nov.csv) and I’ve already combined those three files with Power Query. After the Dec.csv file is ready, can I put it in the same folder with the previous monthly three files and pull all four files into the data model with Power Query?

  4. I just implemented this solution this week. The first time I dropped an additional file in the folder and it auto-magically appeared in the data model, I went to grab coworkers to show them a miracle. In my case I am importing identical Excel files from a SharePoint Online folder, and my code is

    Source = SharePoint.Files(“https://mycompany.sharepoint.com/teams/atsmgrs”, [ApiVersion = 15]),
    #”Removed Columns” = Table.RemoveColumns(Source,{“Extension”, “Date accessed”, “Date modified”, “Date created”, “Attributes”}),
    #”Filtered Rows” = Table.SelectRows(#”Removed Columns”, each ([Folder Path] = “https://mycompany.sharepoint.com/teams/atsmgrs/Shared Documents/FY17 Targets/Sept2016/”)),
    #”Added Custom” = Table.AddColumn(#”Filtered Rows”, “ExcelData”, each Excel.Workbook( [Content])),

    Above steps, in English:
    – Get a list of documents and their properties in a sharepoint site
    – Remove some property columsn
    – Filter down to docs in a specific folder
    – Excel.Workbook() to crack open the XLS files and get to the data.

  5. What would happen if I try to refresh the query having changed London.csv with new data? What will happen with the old London.csv data (not available anymore in csv file)? Will the new London.csv data be appended or will it,overwrite old data?

    Hope my question is clear. Look forward to hearing from you.

    1. Hi Martin,

      The old London data will be deleted. If you want to append increments of data, you need to create a new file with the new data in a separate sub folder or a different name, e.g. London_xxx.csv, and the use the Query Editor to remove the suffix _xxx. You can remove duplicate rows if you have a unique key in each row, or by a combinations of columns.

  6. Please update this post with the data files used in the explanation and in future posts for this series. Thanks.

Leave a Comment or Question