IMPORTANT: You can read the official documentation that Microsoft has on this topic from the following link (url).
I’ve previously done a series on Merge / JOIN operations (First Part here) and it’s now time to do one on Combine / Append operations.
so…How do you combine / append / stack tables with Power BI / Power Query?
There are multiple ways to accomplish this, but we’re going to start with the basics.
Scenario: Combine Data from 2 CSV files
Imagine that we have 2 CSV files.
- Sales data for the year 2017
- Sales data for the year 2018
In reality, our data sources can be any. It can be 1 CSV and 1 Excel file, maybe 1 table from a database and another one from an API.
To Power BI / Power Query, it’s a bit indifferent as they’ll go through the same process regardless of their data source, so what I’m about to show you applies to every single data source possible within Power BI / Power Query.
We want to append both of these into just 1 table. How do we do that?
Connecting to the Data
This is a pretty straightforward process where we can just select our data source from the “Get Data” window and go through the import process which will display a simple browse window so you can select your file and then give you a preview of your data like this:
To finish things off, you can just click on Edit so we can go straight to the Power Query Editor.
This is pretty simple and in our case our csv files have the correct format thus not requiring any transformation steps to be done to it other than the ones that were automatically done by Power BI / Power Query.
We repeat this process for the 2018 file and then we’ll end up with 2 queries:
Those are the queries or tables that we want to stack on top of each other. How do we do that?
Combining / Stacking / Appending Tables
This is truly the easiest part, now all you need to do is find the button that reads Append Queries and then a new window will appear where you can combine all the queries that you want.
It doesn’t need to be just 2 tables, it can be as many tables as you want, which means that next year when I get the data for 2019, I can just create a new query and add this to this list of tables that need to be combined and I’ll be set and done.
The output of this will be a table with all of the columns from all the tables that we combined and all of the data from both.
When you first create the queries, they’ll be, by default, set to load themselves to your Data Model or created as new tables inside of Excel if you’re using Power Query for Excel.
This might create some data duplication which is completely unnecessary. To stop this from happening, you can change the behavior of the Sales 2017 and Sales 2018 queries to be Connection Only.
You can simply right click the query and disable the Load by unchecking the Enable Load option.
Data Privacy and Query Folding
When combining your data it follows the sample guidelines for Data Privacy (here) and Query Folding (here) as showcased in the previously written articles. This means that you need to be aware of how these concepts play out in your specific scenario, so I highly recommend that you read the articles in this topic in case you’re hitting any errors or performance issues.
The Table.Combine Function
When you go through the process described in this post, Power Query will end up using a function called the Table.Combine function.
When Power Query uses this function, it resembles the same experience that happens when you do a SQL UNION ALL. This means that Power Query will try to find matching columns from all of the tables and stack data accordingly, but if one table has columns that the other doesn’t, then it’ll fill any empty spaces with null values.
This function is BY FAR the best method when it comes to appending data as it’s a function that was designed to make this happen.