Looking back at this Power Query series, I can see that I have omitted to post about one of the more common and more powerful features; which is Group By. We have used it a few times already, but in this post, I wanted to give it a more detailed look.
In Excel, there are many ways to group and summarize data. We might use techniques such us:
- Functions – SUMIF(S), SUMPRODUCT, or the new Dynamic Array functions
- Pivot Tables
- Subtotal feature (Data -> Subtotal)
Now there is a new method… the Group By feature in Power Query. Let’s learn how to use it.
To keep things easier, we will use the same file as the previous part.
The examples in this post use the file called Example 15 – If Function.xlsx
The example file contains just one Table. Select a cell in the Table and click Data -> From Table/Range to load the data into Power Query.
Once in the Power Query Editor, the Group By feature is the first icon on the Transform tab: Transform -> Group By. The icon appears elsewhere, in the user interface, but this is the easiest to find.
Group By comes in basic and advanced forms, though the advanced options are pretty easy by themselves.
Basic Group By
The basic form provides a single column option and a single operation. To display the basic options, ensure the Basic button is selected (as shown in the screenshot below).
If working along with the example, select the following options (these are also shown in the screenshot above).
- Column to use: Product
- New column name: Average Sale
- Operation: Average
- Column: Value
This operation calculates the average value of each unique item in the Product column. Click OK.
The Preview Window will look like this:
There are lots of operators for us to use:
- Count Rows
- Count Distinct Rows
- All Rows (which creates a table of each grouped item)
Advanced Group By
The advanced options give us the ability to use as many columns as we want. Click the Advanced button to display the options.
For this example, let’s say we wanted the total and count of sales transactions by Product and Customer.
Set the first box equal to Product. Click the Add Grouping button, and set the second box equal to Customer.
In the new column name section, apply the following settings:
- New column name: Total Sales
- Operation: Sum
- Column: Value
Click the Add aggregation button, then set the new column as follows
- New column name: Transactions
- Operation: Count Rows
- Column: [not required for Count Rows]
The Preview Window now looks like this:
This shows the data aggregated by the combined Product and Customer columns. While each column may contain repeats, each pair is unique.
Hopefully you can agree that Group By is a simple, yet powerful feature.
In my mind, Power Query is about preparing data for use in Excel or a Data Model. Therefore Group By is a conceptually tricky feature. It’s excellent for creating summarized reports, but we don’t necessarily use Power Query to create reports in that way. Therefore, I tend to use Group By to reduce down the amount of information down to the lowest level of granularity required, rather than for creating reports.
- Import Data
- Data Refresh
- Edit Queries
- Close & Load Options
- Using Parameters
- Basic Transformations
- Combine / Append Queries
- Import All Files in a Folder
- List All the Files in a Folder and File Attributes
- Import Data from the Current Workbook
- Import Data from the Web
- Unpivot Data
- Unstacking Data in a Column
- Lookup Values using Merge
- Change the Source Data Location
- If Statements for Conditional Logic