How to append multiple CSV files from a folder in Excel

Last month the Power BI team at Microsoft released an enhanced “combine binaries” experience, that I covered here last month. That new functionality allows you to easily combine multiple CSV files (and other file types) from a folder and utilize their filenames as a column in the final result (which is awesome – if you disagree, you don’t understand it yet, go read the post! Smile ).

As I have committed to begin a series on CSVs, it’s time for the second part of the series. I think that today’s blog post will be quite helpful for many of you. So without further ado, let’s begin part #2 of the CSV series.

Great, we have a simplified append of CSV files in Power BI, but what we can do in Excel?

While we patiently wait to have the same functionality in Excel, in today’s post I will show you how you can create relatively simple query functions to resolve our challenge and combine multiple CSV files from a folder while extracting essential information from their associated filenames. But first, let’s review three possible scenarios, and how to combine CSV files from a folder on each.

The Basic Scenario – No contextual data in filenames

If you have a folder with many CSV files that share the exact format, you can could append them all into a single table in Excel file.

Here is an example of such CSV files. Each file contains different Star Wars characters, and we want to append them all. Instead of massive repetitions of copy and paste, or a manual import of each file, here is the Power Query’s game changing way to do it.

How to append multiple CSV files from a folder in Excel

If you have Excel 2016 In Data tab, click New Query, select From File and then click From Folder. If you are on Excel 2010 or 2013, install Power Query Add-In (download it here) and in the Power Query tab, click From File –> From Folder.

How to append multiple CSV files from a folder in Excel

Click Browse, and select the folder that contains the CSV files. Click OK in Browser For Folder, then click OK in Folder.

How to append multiple CSV files from a folder in Excel

 

If you see this preview window, click Edit.

How to append multiple CSV files from a folder in Excel

The Query Editor window will appear. This is the main screen to apply transformations on your data. When we’re done, clicking Refresh All in the Data tab (of the ribbon) will automatically load the data from any new and/or modified files and transform it to our desired format.

The next step is where the real magic begins.

Magic – Squishing All the Files Together into One Table

(Note from Rob:  yes, we’ve covered parts of this on the blog before, BUT this is a lesson that bears repeating, shouting from the rooftops, and seeing from many perspectives – PLUS it sets up the second half of this post, so this is all very valuable.)

Unfortunately, the portal to the magic land is well hidden. On the left side of the header of the first column you will notice a small button with two arrows. Click on it.

 clip_image023

The result: All the CSV files are appended to a single table. We are still inside the Query Editor and can see a preview of the transformation. But there is still some work for us. Notice that each file contains the header names, so the appended result has all the headers as rows in the table.

How to append multiple CSV files from a folder in Excel

In Transform tab, click Use First Row As Headers.

How to append multiple CSV files from a folder in Excel

The last step promoted the headers of the first CSV files as the headers of the unified appended table.

Removing the Header Rows from the Other Files

Above, we “promoted” the header row from the first file, so that one is taken care of, but we still need to remove the header rows from the other CSV files. To do it, we can apply a simple filter on the first column.

Click the filter button in the header of the first column (In my case column Name), and uncheck the header name in the filter pane (In my case – Name, as highlighted below). When you click OK in the filter pane, the header rows will be filtered out.

How to append multiple CSV files from a folder in Excel

 

Note: In some unexpected cases, unchecking a value in the filter pane creates a dangerous logic that will yield unexpected results. Make sure you activate the Formula Bar of the Query Editor, and confirm that the formula explicitly excludes the value that you unchecked (Keeping the Formula Bar invisible by default, is one of the most common mistakes you can do in Power Query. Read more here – but short version is that you WANT to see something like ‘[Name] <> “Name” ‘ – if you see ‘[Name] = “R2-D2” or [Name]=”R5-D4” or… ‘ then you probably want to change it to the single “<>” version).

In case your CSV files overlap and you think you have duplicate rows, you can easily remove the duplicates by selecting the columns, whose combined values are unique (In my case it’s column Name). After you select the column/s, right click on one of the headers, and select Remove Duplicates.

How to append multiple CSV files from a folder in Excel

We are done. We can now click Close & Load in Home tab to close the Query Editor to load the append data into Excel.

How to append multiple CSV files from a folder in Excel

And here are the results.  Of course, we’d use “Load To” in order to get this into Power Pivot’s data model in most serious cases, but this time I’ll just drop it into Excel for simplicity:

How to append multiple CSV files from a folder in Excel

Second Scenario – The contextual information is in first row of each CSV file!

(Note from Rob:  YEAH, this is what I’m talking about!  Even I learned some good stuff while reviewing this.  Glad you stuck around? Smile)

What if instead we want to append CSV files that share the same format, but each file has a unique context that is crucial our analysis? For example, imagine we have sales data from three regions of our company: London, New-York and Paris.

How to append multiple CSV files from a folder in Excel

How can we keep the region after we append the data from the three CSV files?

In this section, we will learn how to append the data from all the regions, when the region name is given in the first cell of each file.

 

How to append multiple CSV files from a folder in Excel

After we select our folder, we should click our magic button again:

How to append multiple CSV files from a folder in Excel

In the next steps, we will now transform the tables, and move the regions from the first row of each file to a new column, as illustrated here:

How to append multiple CSV files from a folder in Excel

In Add Column tab, click Conditional Column.

How to append multiple CSV files from a folder in Excel

In Add Conditional Column dialog, Select Column2 as Column Name, equals as Operator, and “” as Value. Then, change the type of Output to Column, and set Column1 as Output. When you’re done, click OK.

clip_image046

Bummer:  We Have to Edit the Formula Just a Touch

Note: The window above doesn’t support blank values as an input – YUCK. So we will need to fix the formula. If this is your first time using the Query Editor, and you don’t see the formula bar, go to View tab and check the Formula Bar checkbox.

In the formula bar, replace the long sequence of double quotes to two double quotes “”.

Here is the original formula (before the change):

clip_image048

Here is the modified formula:

clip_image050

Filling it Down (Yes, Filling, Not Filing)

In the next step, we will fill down the region values, so each row will include its corresponding region.

Select column Custom and right click its header. Select Fill and then select Down.

clip_image052

Click on the filter icon of Column2 and select Remove Empty.

clip_image054

In Transform tab, click Use First Row As Headers.

clip_image056

Now we can remove all the header rows. Click the filter icon on the first column (In my case – Date), scroll down till you see the column name as a value (In my case – Date) and uncheck it, then click OK.

How to append multiple CSV files from a folder in Excel

We can rename the last column to City or Region.

How to append multiple CSV files from a folder in Excel

In the last two steps we can change the column types of Sales and Date, by clicking the small ABC icon in the headers and select Decimal Number for Sales and Date for Date.

How to append multiple CSV files from a folder in Excel

How to append multiple CSV files from a folder in Excel

That’s it. We can now close the Query Editor. In Home tab click Close & Load.

We were able to append all the sales data and keep the region/city information. You will find this method useful in many scenarios where your CSV files contain meta-data before the actual tables. Copying the data to a new custom column, and filling it down will do the trick.

How to append multiple CSV files from a folder in Excel

Third Scenario – The context is in the filename

(This is the part that is just like my previous post, but we’re doing it in Excel rather than Power BI)

So what should we do if our meta-data (like region names) isn’t available in the CSV itself, but in the filename. In our example, all the CSV files contain the sales data, and the region/city name is in the filename, as shown here:

How to append multiple CSV files from a folder in Excel

Till the improved Combine Binaries feature in Power BI is released in Excel, we cannot click the Combine Binaries button in the header of column Content to append the data without losing the crucial information in the filenames.

Here is the solution:

After you import the folder, follow the steps below.

Select the first two columns Content and Name. Right click on one of the headers and select Remove Other Columns.

How to append multiple CSV files from a folder in Excel

In Home tab, click New Source, then select Other Sources and select Blank Query.

How to append multiple CSV files from a folder in Excel

Rename the new query to LoadCSV.

Click Advanced Editor.

How to append multiple CSV files from a folder in Excel

Paste the following code in the Advanced Editor and click Done.

(content as binary, filename) =>

let

    #”Imported CSV” = Csv.Document(content,[Encoding=1252, QuoteStyle=QuoteStyle.None]),

    #”Promoted Headers” = Table.PromoteHeaders(#”Imported CSV”),

    #”Added Custom” = Table.AddColumn(#”Promoted Headers”, “filename”, each filename)

in

#”Added Custom”

 

How to append multiple CSV files from a folder in Excel

Note: Keep the code above. You will be able to reuse it whenever you need to append CSV files and their filenames from a folder.

In Add Column tab, click Invoke Custom Function.

How to append multiple CSV files from a folder in Excel

In Invoke Custom Function dialog, select LoadCSV as Function query.

 How to append multiple CSV files from a folder in Excel

Select column Content as content, select Column Name as the type of filename (optional).

How to append multiple CSV files from a folder in Excel

Select Name as filename (optional), and click OK.

How to append multiple CSV files from a folder in Excel

We can now remove the first two columns, by selecting the third column, right clicking on its header and selecting Remove Other Columns.

How to append multiple CSV files from a folder in Excel

There is another magic button that we can use in the header of column LoadCSV. This button will expand all the tables we extracted from the CSV files, and will transform them into a single table – with the region/city context intact. Click on the icon (highlighted below), uncheck Use original column name as prefix, and click OK.

How to append multiple CSV files from a folder in Excel

We can now manipulate the filename values to represent the cities by removing the extension from the filename).

To remove the .csv extension and keep the city names, select column filename and right click on its header. Then, select Replace Values..

How to append multiple CSV files from a folder in Excel

In Replace Values dialog, set .csv as Value To Find, and click OK. (Note that we didn’t provide any text to replace with, as we want to delete this extension).

How to append multiple CSV files from a folder in Excel

We can now rename the column filename to City, and change the types of Date and Sales as we did in the previous section. Finally, clicking Close & Load in Home tab will provide the expected results.

How to append multiple CSV files from a folder in Excel

From now on, any additional CSV files that you’ll add in the folder will be automatically appended when you click Refresh All in Data tab.

Conclusions

Following this blog post, I hope that you’ll never again manually append multiple CSV files into your worksheet. The Power Query way is relatively easy, and it allows you to refresh the data when new CSV files come in.

Would you like to improve your Power Query skills? Follow my new series on 10 common mistakes you do in Power BI and Power Query, and and how to avoid the Pitfalls here.

  Subscribe to PowerPivotPro!
X

Subscribe

Gil Raviv

Principal Consultant at PowerPivotPro and a blogger at DataChant.com. Former Senior Program Manager on Microsoft Excel team. Led Power Query integration in Excel 2016. 

This Post Has 30 Comments

  1. Hi Gil, Thanks for the good post.
    Just a remark on the 2nd scenario, I don’t think it’s great to promote headers after filling down as a variable will be put in headers (London), then when you rename it, it looks for a column called “London”, which is not very flexible if you change the order of your files. Better to rename , Column1, Column2… and then filter out “Date” values in first column.
    Olivier.

    1. Good catch Olivier, and thank you for the feedback. You can find a possible solution is in my reply to SP below. I am not sure I followed your proposed solution (“Better to rename, …”). Could you please elaborate?

  2. In the step where you rename the last column to “City”, this works if you have “London” as your first CSV file but what if you don’t or what if the order of CSVs changes? Is there a way to just refer to the “last column” rather than referring to it by name with the Table.RenameColumns() function?

    1. Thank you SP for the great feedback. I agree with you. The flow that I shared above was simplified. To improve the query, and make it robust to changes in data, the best way is to access the last column name by index. Here is how you can get the column name: List.Last(Table.ColumnNames(#”Last Step”)). This code can be used inside the rename function. To get the name of column i, you can use the code Table.ColumnNames(#”Last Step”){i-1}.

      So the entire code to rename “London” column above a robust way is as follows (assuming the last step is called #”Last Step”):
      = Table.RenameColumns(#”Last Step”, { { List.Last( Table.ColumnNames(#”Last Step”) ), “City” } } )

      My next post as part of the Pitfalls series will include this solution as well: http://datachant.com/tag/robust-queries/

  3. Thanks for a clear and concise example, Gil. I recently began using PowerQuery to combine csv or other files for processing and have been able to replace hundreds of lines of VBA code that convert Excel budget template input data, (from hundreds of templates), into system load files, (basically automated ETL.) I never looked back once I found the “unpivot” feature in PowerQuery to convert cross-tab type input data into a flat file and the Group feature to summarize it to a load file.

    In the not so distant past, I would go old school with a DOS batch file, e.g., Copy c:\Data\*.csv c:\Data\Combined.csv. This quickly and effectively combines all of the csv files in the Data folder into one “Combined.csv” file, although I usually add filename wildcards to append only selected csv files, so that the previous “combined.csv” file is not included in the next combine… Also in practice, I usually do this from an Access database that at the click of a button writes out a new combine file batch file each time and executes it with updated date or other parameters selected from a form to combine only selected csv files.

  4. Nice! Couple of things I bumped into.

    I’m of Office365 Excel 2016 and for some reason the Add Column tab does not have Invoke Function button. I added a call to the function manually in the Add Column Dialog: =LoadCSV([Content],[Name]).

    My other issue is that I am loading Tab Delimited files. I added the delimiter to the function like this:

    #”Imported CSV” = Csv.Document(content,[Delimiter=”#(tab)”, Encoding=1252, QuoteStyle=QuoteStyle.None]),

    Really nice work! I never would have figured out all of this on my own.
    Thanks very much….

    1. J-K-L you will probably get the invoke function soon. I use the First Release version, and I thought that the Invoke Function is already out. Glad that you figured it out.

  5. Hello, This tutorial is great! I do have one question. When importing the data using the Power Query, it does not give me the option to set a column as Text and converts long strings of numbers to scientific notation. Is there a way to select certain cells to maintain ‘Text’ or ‘Number’ when you click the double arrow button? Thanks!

  6. Hi guys, quick question. So I am combining a lot of csv files using PQ into several data models. This has worked perfectly for some time now. After recent O365 updates, i have come into some issues. First and foremost, output from PQ is now in a single column, meaning unparsed. Delimiting this, via semicolon, returns a lot of ‘null’ cells. This is a serious setback, and more or less ruins any new models i create. Curiously, the older ones work perfectly even when adding new csv files to the folder. Please advise.

      1. Hi Gil.

        Thanks for quick followup. Below you’ll find the string from editor:

        let
        Source = Folder.Files(“C:\Temp\Margin Analysis\12.2016\Sales 2016 – Copy”),
        #”Invoke Custom Function1″ = Table.AddColumn(Source, “Transform Binary from Sales 2016 – Copy”, each #”Transform Binary from Sales 2016 – Copy”([Content])),
        #”Renamed Columns1″ = Table.RenameColumns(#”Invoke Custom Function1″, {“Name”, “Source.Name”}),
        #”Removed Other Columns1″ = Table.SelectColumns(#”Renamed Columns1″, {“Source.Name”, “Transform Binary from Sales 2016 – Copy”}),
        #”Expanded Table Column1″ = Table.ExpandTableColumn(#”Removed Other Columns1″, “Transform Binary from Sales 2016 – Copy”, Table.ColumnNames(#”Removed Other Columns1″[#”Transform Binary from Sales 2016 – Copy”]{0}))
        in
        #”Expanded Table Column1″

        Now, due to full panic mode, I did try it in PBID. And it worked as expected.

      2. So I looked in the editor of the working models, based on the same data with same structure, and this is revealed:

        let
        Source = Folder.Files(“C:\Temp\Margin Analysis\12.2016\Sales 2016″),
        #”Combined Binaries” = Binary.Combine(Source[Content]),
        #”Imported CSV” = Csv.Document(#”Combined Binaries”,[Delimiter=”;”, Columns=32, Encoding=1252, QuoteStyle=QuoteStyle.None]),
        #”Promoted Headers” = Table.PromoteHeaders(#”Imported CSV”),
        #”Changed Type” = Table.TransformColumnTypes(#”Promoted Headers”,{{“Calendar day”, type date}, {“Buying group (Dyn.)”, type text}, {“Buying group (Dyn.)_1”, type text}, {“Sold-to party”, type text}, {“Sold-to party_2”, type text}, {“Postal Code”, type text}, {“Ship-To Party”, type text}, {“Ship-To Party_3”, type text}, {“Responsible Sales Person (Dyn)”, type text}, {“Responsible Sales Person (Dyn)_4”, type text}, {“Shipping conditions”, type text}, {“Shipping conditions_5”, type text}, {“E-Business Indicator”, type text}, {“Cust. p. order type”, type text}, {“Cust. p. order type_6”, type text}, {“Plant”, type text}, {“Profit Ctr(Man)(Dyn)”, type text}, {“Profit Ctr(Man)(Dyn)_7”, type text}, {“Project”, type text}, {“Project_8”, type text}, {“Sales document”, type text}, {“Sales document_9”, type text}, {“Item category”, type text}, {“Sales doc. type”, type text}, {“Pref. vendor (Dyn.)”, type text}, {“Pref. vendor (Dyn.)_10”, type text}, {“Material”, type text}, {“Material_11”, type text}, {“Electrical no. (Dyn)”, type text}, {“Revenue”, type text}, {“Actual cost price”, type text}, {“Charge backs”, type text}}),
        #”Replaced Value” = Table.ReplaceValue(#”Changed Type”,” NOK”,””,Replacer.ReplaceText,{“Revenue”, “Actual cost price”, “Charge backs”}),
        #”Replaced Value1″ = Table.ReplaceValue(#”Replaced Value”,”.”,””,Replacer.ReplaceText,{“Revenue”, “Actual cost price”, “Charge backs”}),
        #”Changed Type1″ = Table.TransformColumnTypes(#”Replaced Value1″,{{“Revenue”, type number}, {“Actual cost price”, type number}, {“Charge backs”, type number}}),
        #”Filtered Rows” = Table.SelectRows(#”Changed Type1″, each ([#”E-Business Indicator”] = “#” or [#”E-Business Indicator”] = “X”))
        in
        #”Filtered Rows”

        1. You can try editing the CSV function with the semicolon and number of columns, and see if it helps. Do your CSV files has semicolon has delimiters? You should consider sharing it in PBI community forum. Sounds like a bug.

          1. Will do, thanks. The problem is that some of the masterdata has semicolon in its cells, so going the delimiter way is not optimal right now.

            For fun I created just two csv files in a small 30×4 grid, and did the combine binaries. Also turned out in one column.

            Will try the community and maybe dare myself in the editor mode. Thanks for quick followups.

          2. Look at that, it appears latest build 7766.2047 prompts for data type detection, like in PBID. Problem solved. Thanks guys.

  7. Nice, I see this all the time in my group. I understand the need for UI use but appending files in a folder is as simple as adding a custom column with this formula after connecting to the folder:

    =Table.PromoteHeaders(Csv.Document(File.Contents([FolderPath]&[Name]),[Delimiter=”,”,Encoding=1252]))

    Then expanding the column of tables created. 🙂

    1. I’m trying to understand the solution to problem I share with RG — “Appending .csv folder in Power Query no longer working properly” – returning single column

      1. Hi SKG. For me, updating O365 solved it. Now, like in the desktop version, it prompts you for data type. It appears I’m on build 7870.2020 now.

  8. Hello Gil, I am currently a student doing a computer project on Star Wars. I was wondering if you could send me that csv file on star wars.

  9. Thanks for the great post Gil. My files were .txt and not CSV, therefore had to remove the Promoted Headers command for it to work.

    (content as binary, filename) =>
    let

    #”Imported CSV” = Csv.Document(content,[Encoding=1252, QuoteStyle=QuoteStyle.None]),
    // #”Promoted Headers” = Table.PromoteHeaders(#”Imported CSV”),
    #”Added Custom” = Table.AddColumn(#”Imported CSV”, “filename”, each filename)

    in

    #”Added Custom”

  10. Hi Gil, I am a business analyst (not a coder) and I tried using your code above to capture the file name, however, I am getting a “Token Identifier Expected Error” in the Query when I am attempting to use (content as binary, filename) =>

    let

    #”Imported CSV” = Csv.Document(content,[Encoding=1252, QuoteStyle=QuoteStyle.None]),

    #”Promoted Headers” = Table.PromoteHeaders(#”Imported CSV”),

    #”Added Custom” = Table.AddColumn(#”Promoted Headers”, “filename”, each filename)

    in

    #”Added Custom”

    My data will not have any headers and has a unique extension. Not sure if that matters.

    Appreciate your input.

    Thank you.

  11. Hi Gill,

    Thanks for this post. I have one problem. It’s working fine on my local computer. however if I put this on another location then data doesn’t load. Is there a way to put relative path in Source function.

    Source = Folder.Files(“Relative Path″)

    Thanks in advance.

Leave a Comment or Question