In this post, we are looking at using Power Query to import all the files in a folder. We give Power Query a folder path, click a few buttons and it will import and combine all the files into a single table. That’s powerful stuff, right! If we want to add another file into the output table, we only have to save a copy of the file in the folder and click refresh, the new file will be imported too. This can save you hours and hours.
Before we get started on this technique, there is one point which I need to make you aware of. All the files to be imported must follow a similar structure and column pattern. Power Query is magic, but you’ve got to give it a reasonable chance.
The examples in this post use the following files:
- Example 10 – January 2019.csv
- Example 10 – February 2019.csv
- Example 10 – March 2019.csv
- Example 10 – April 2019.csv
- Example 10 – January 2019.xlsx
- Example 10 – February 2019.xlsx
- Example 10 – March 2019.xlsx
- Example 10 – April 2019.xlsx
Setting up the files for the examples
To work along with the examples in this post, start by moving the January and February files into a separate folder. These are the files we will import initially.
Do not include the March or April files, we will add them in later as part of the examples.
Import all CSV files from a folder
The easiest files to work with are CSV or text files, so that’s an excellent place to start. What makes them easy to work with?
- They always contain a single worksheet
- The first row is treated as the header row
- Merged cells are not permitted
In Excel, click Data -> Get Data -> From File -> From Folder
The folder window will open. Click Browse…, navigate to the folder to be used, then click OK.
A preview of the folder and file attributes is displayed. If the files are entirely uniform and there are no edits to make, we could just click Combine (drop-down button) -> Combine Close & Load. However, you’ll notice we have a mix of .csv and .xlsx files in the folder; we can’t combine these easily. Instead, we will click Transform Data and take a closer look at what’s happening here.
The Power Query editor opens. It does not show the data from the file as it has in the past, instead, it shows data about the files.
Our first transformation will be to keep only the files we wish to be combined (the CSV files for our current example). It is always good practice to filter to only include the files we need; you never know when another user might decide to save a random file within the folder.
Filter the Extension column to include only .csv.
From here we have two options:
- Use a Custom Column
- Combine the files
Which method you choose is up to you, and you may even decide to use different methods for different circumstances.
Method 1: Use a Custom Column
The first method uses a Custom Column containing a Power Query function.
Start by selecting the Content and Name columns, then click Home -> Remove Columns -> Remove Other Columns. This removes the file attribute columns which we do not require. We have kept the Name column to illustrate that using this method you can retain file attributes through into the final output.
Click Add Column -> Custom Column.
In the Custom Column window, input the following information
New column name: CSV
Custom column formula:
A new column called CSV is added to the preview window.
You will notice a new icon in the CSV column header. This is the Expand icon and indicates that the column contains more detail which can be expanded.
Click on the Expand icon, then click OK as we want to keep all the columns of data.
We have now expanded all the data from both files, and the January & February files are combined.
To tidy up the table, do the following:
- Remove the Content column
- Add the column headers to be Source File, Date, Customer, Product, Sold By and Value.
- In the Value column filter to remove the text of “Value”
- Set the data types for each column
Our preview window should now look like this:
That’s it all done, click Close & Load to load the Table into Excel.
Method 2: Combine the files
Now time for Method 2. This uses Power Query’s magic built-in combine process.
Assuming we didn’t do Method 1, the Preview window should look like this:
Did you notice the icon with the two down arrows at the top of the Content column? That’s the Combine Files icon, it drills from the file level into the content level, and combines each file. Go ahead and click it… you know you want to.
The Combine File window will open, showing a sample of the data using the first file in the query as an example. All looks good, so click OK.
Power Query will now do its magic. Once it has finished, we can see the files have been combined, that’s all it took, one button! As shown in the screenshot below, a lot of transformations and steps have been made to achieve this.
The preview data should look like this.
You can see that both the January and February data has been combined into this single table.
Click Home -> Close & Load to load the data into Excel.
Adding more files to the folder
I have created a Pivot Table to show that the January & February files have been combined.
Copy the Example 10 – March 2019.csv file into the folder.
Now for the moment of truth… in Excel, click Data -> Refresh All, the data will be updated.
“If it’s updated, why hadn’t the Pivot Table updated?” – Good question. The Pivot Table refreshed before the data finished loading from the query, so we have to click Data -> Refresh All again to refresh the Pivot Table.
To turn off the need to double refresh, follow my post on Data Refresh.
File uniformity is key to success
Now add the Example 10 – April 2019.csv file into the folder. Click Data -> Refresh All
Method 1 will update without any issues. Method 2 though will lead to an error.
Why did this happen? What went wrong? Open up the Example 10 – April 2019.csv and take a close look at the first row. In this file, the column called Customer which exists in the other files has changed to Sold To.
As part of the automatic steps, Power Query promotes the first row of a CSV file as the column headers. When we initially combined the files using Method 2, it used February as the sample file, therefore the column headers from February were set as the column headers for the whole query. Power Query has no way of knowing that April is different from the other files.
Method 1 does not use the column headers from the CSV files, but uses the relative column positions. Which is okay unless the column order changes in any files, in which case you could be combining different types of data without knowing it.
This is why file uniformity is key to the success of this technique.
Import Excel workbooks from a folder
The method to import Excel workbooks is almost identical to CSV files. But it introduces different challenges. We’ve already seen in previous posts that Power Query likes to hard code values into the query. This becomes a problem when all our Excel workbooks have worksheets with different names. Guess what… yep, all our Example files have different worksheet names.
Copy the Example 10 – March 2019.xlsx and Example 10 – April 2019.xlsx into the folder.
In Excel, follow these steps:
- Click Data-> Get Data -> From File -> From Folder.
- In the Folder window click Browse… navigate to the folder, then click OK.
- When the list of files appears, click the Transform Data button.
- The Power Query Editor window opens.
- On the Extension column filter to only include the .xlsx files
Let’s take a look at the same two methods as before
Method 1 – Use a Custom Column
Method 1 uses a Custom Column formula as before.
Select the Content and Name columns, then Remove Other Columns.
Now let’s add the new column, click Add Column -> Custom Column.
In the Custom Column window, input the following information
New column name: ExcelFile
Custom column formula:
A new ExcelFile column is added. Click on the Expand icon, then click OK to include all the columns.
Every worksheet and table from the workbook is listed. The April file contains a worksheet and a table, so filter the ExcelFile.Kind column to remove the Tables and retain only Sheets.
Select the ExcelFile.Data column, then Remove Other Columns. Click the Expand icon on the ExcelFile.Data column.
Look at the Preview window – Ta-Dah! All the worksheets are now combined. Next Close & Load to get the data into Excel.
Method 2: Combine the files
Now time for Method 2. This uses Power Query’s built-in combine process again. Assuming we didn’t do Method 1, the Preview window should look like this:
Click the combine icon as we did when using Method 2 for CSV files.
As Example 10 – April.xlsx was first in the list of files, Power Query will use this as it’s Sample File. On the Combine Files window, select the April sheet and click OK.
If you all the worksheets have the same name, then you’re in luck. All the data will be combined into a single table.
But for our scenario, as you might expect, only the April data will load into the Preview window. There is only one workbook with a worksheet called April, so only one worksheet will be loaded. We have to find the location in the M code where the sheet name has been hardcoded and adjust it.
Open the Transform Sample File query.
Click on the Navigation step in the applied steps.
The M code for this step looks like this:
Delete the highlighted section. The M code should now be this:
Go back to the main query and look at the Preview window again. All the worksheets are now combined – that’s magic.
Finish off the query
We’ll need to make some transformations to tidy things up.
- Remove the top 3 rows
- Promote the first row as the header by clicking Transform -> Use First Row as Headers
- Select the last column and filter to remove “null” and “Value”.
- Apply the correct data types to each column.
Finally, Close and Load the query into Excel.
Notes & tips
Here are a few bonus tips:
- When looking for files, Power Query will look in sub-folders too. This is great when files are organized within a defined structure. But if the folders are in a bit of a mess, then a bit of housekeeping might be required to get this method to work.
- Plan for what might go wrong, think about what might happen if somebody else were to save a different file type in the folder. When you create the query, include extra filter steps to ensure only the files you want are imported.
- Data which comes from an IT system normally works well with this technique. I find that any data which requires user input tends to cause problems… why? Because people change things, they add columns, they rename sheets, they type over static values. If your files come from user input, be prepared to use the combine and append method.
- Import Data
- Data Refresh
- Edit Queries
- Close & Load Options
- Using Parameters
- Basic Transformations
- Combine / Append Queries
- Import All Files in a Folder
- List All the Files in a Folder and File Attributes
- Import Data from the Current Workbook
- Import Data from the Web
- Unpivot Data
- Unstacking Data in a Column
- Lookup Values using Merge
- Change the Source Data Location
- If Statements for Conditional Logic