Data doesn’t always come in one single file, or from one download. We often have to find a way to stitch multiple files together so that we can use it as if it were a single data source. This might be cost center reports, monthly data extracts, product profiles, survey data, etc. Basically, it applies to any data which comes from multiple sources.
Power Query gives us the power to combine queries or append one query into another. And don’t forget we can refresh all the data sources with a single click of Refresh All. This means we can put an end to tedious copy and paste routines for combining multiple files together.
In this post, we will combine specific Excel files, but the approach is the same for CSV or other file types. In fact, we can combine files of different types from different sources into a single table. The example below is quite basic; it’s there to show the principles. But once you have the principles you’ll be able to adapt it to your scenarios.
Download the example files
The examples in this post use the following files:
- Example 9 – January 2019.xlsx
- Example 9 – February 2019.xlsx
- Example 9 – March 2019.xlsx
- Example 9 – April 2019.xlsx
We will work through all the essential steps from start to finish, so get Excel fired up, let’s get going.
Create the first query
Open a new workbook in Excel; this will be the place where the combined data will be loaded to.
Click Data -> Get Data -> From File -> From Workbook. If your data is stored in another file format, select the appropriate option.
Navigate to the Example 9 – January 2019.xlsx file from the downloads and click Import.
From the Navigator window, select the worksheet which contains the data (which is the January worksheet in our example), then click Transform.
The Power Query editor will open, showing a Preview of the data.
There are a few simple transformations we need to get this into a useable format.
- Remove the top two rows by clicking Home -> Remove Rows -> Remove Top Rows. Enter 2 into the Remove Top Rows window and click OK.
- Promote the first row of data to the header by clicking Transform -> Use First Row as Headers
- Add a month end date column by clicking Add Column -> Date -> Month -> End of Month
- Add a column containing the week number. Click Add Column -> Date -> Week -> Week of Year
- Rename the Week of Year column to Week
- Move the End of Month column to the start by Right-clicking on the End of Month column header, select Move -> To Beginning from the menu.
- Check the following data types have been applied, if not, do it manually.
- End of Month = Date
- Date = Date
- Customer = Text
- Product = Text
- Sold By = Text
- Value = Whole Number
- Week = Whole Number
The preview window should look like this:
That’s all the transformations we need. Click Home -> Close and Load To…
On the Import Data window, select Only Create Connection, then click OK.
The Queries & Connection options will show the January query (click Data -> Queries & Connections if this section is closed).
Duplicate and edit the query
For our example, we don’t want to go through the same transformation steps again. Instead, we will duplicate the January query, then change the steps to work for the Example 2019 – February.xlsx file.
Open the Power Query editor by double-clicking on the January query in the Queries & Connections window. Expand the Queries navigation section (to the left of the Preview window), right-click on the January query and select Duplicate.
Rename the duplicated query to February.
The first two steps of our example have January as hardcoded values within the M code, so we must edit these steps. We need to change any references from January to February. How you edit these is up to you. Either click the settings icon, next to the step, or edit the M code directly in the Formula Bar or Advanced editor.
In terms of the M code, the changes are as follows:
= Excel.Workbook(File.Contents("C:\Users\marks\Power Query Examples\Example 9 - January 2019.xlsx"), null, true)
= Excel.Workbook(File.Contents("C:\Users\marks\Power Query Examples\Example 9 - February 2019.xlsx"), null, true)
That’s it. Click Close and Load To… to load the February query as Create Connection Only.
The Queries and Connections will now show the January and February queries.
Combining or appending queries is reasonably straight forward. Open the Power Query editor. Click Home -> Append Queries (drop down) -> Append Queries As New
The Append window opens. There are two views, the view for combining two queries, or the view for combining three or more queries. Both are straight forward to use.
Two table combine settings
Three or more queries combine settings
After clicking OK, a new query is created, probably given the name Append1. Rename the combined query to something meaningful.
Click Close and Load to load the query into a new worksheet.
Take a look at the new Table in Excel. It contains both the January & February data. I have added a Pivot Table showing the Value by Week.
There are now three queries within the workbook; January, February, Combined Sales Data
Next, try adding the Example 9 – March 2019.xlsx file on your own. Follow the same steps above to create the connection, then edit the combined query to include and March data. You can use the three or more tables option in the Append window.
The data and Pivot Table would now look like this, with the additional weeks from the March file added.
When column headers are different
The Example 9 – April.xlsx file is slightly different from the others. The Customers column header has changed to Sold To.
If we follow the same steps as above, the query will break at any steps where the column header is used within ta ransformation. In our example, the query breaks first when changing the data type.
Power Query can’t find a header called Customer as it is not called Sold To in the source file. There are two steps which reference the specific column header. Without these steps we would not have known there was an issue.
For the purposes of illustration, change the code for each step so that the Customer becomes Sold To. Then Close and Load the April query as a Connection Only.
Next, follow the same instructions as above to add the April query into the combined query.
Look at the screenshot below. The Customer and Sold To columns were both the 3rd column in the query, yet they have not been combined. This is a crucial point to note. Power Query uses the column headers to determine which columns to combine, not the position within the query.
To make this combination work correctly, we could add a step into the April query to change the Sold To column header to Customer. That would provide consistent headings, and allow the columns to combine correctly.
Combining files of different types/data structures
We saw above that when combining queries, columns with the same header are combined into the same column. Therefore, we can combine files from lots of different sources into one query. As an example, imagine you have actual data from an accounting system and ten files from different cost center owners outlining their budget for the next year. This is no problem for Power Query. Create different queries for each source file, all of which must end up with the same columns, then combine the queries into a single table.
Next, we’ll look at how we can automate even more of this by combining all the files in a folder.