Miguel Escobar Published April 30, 2019

Combine or Append Data: Optimal Combination Pattern

Power BIPower Query

This is going to be the last post series in the series on Combine or Append Data.

In the first post we saw the basics of how to do the Append operation through the UI.

In the second post we saw the Combine Files experience with Flat Files and how easy it is to combine as many files as you want.

In the third post we had a contrast of the Combine Files experience using Excel Workbooks instead of simple flat files and what things we needed to consider this time that we didn’t consider with simple flat files.

In this fourth and last post we’ll be going back to the basics using the function that we discovered in the first post – Table.Combine which is the most optimal function for combining / appending data.

Why this approach?

Don’t get me wrong, the other approaches are completely valid, but in some cases they might add some overhead to the whole process which is unnecessary.

I’m talking specifically about the Combine Files experience which does improve the end-user experience by adding some overhead, but if your goal is to improve performance, then you could potentially make a compromise on that end-user experience to optimize your query and make it run up to 500% faster.

No – I’m not saying that we should use exactly the same approach that we did on the first post in this series.

That would be too time consuming and it simply won’t scale, but what about a combination of that Append approach and the Combine Files?

This is more for advanced users rather than your average Power Query user, but it does have a huge impact if your goal is to improve performance and this is usually what I personally do when I get hired to optimize queries from my clients.

The Setup

I’ll be using the same dataset that we used in the 2nd post for the Combine Files using multiple flat files from a series of folders.

The end goal is to use a function called Table.Combine to combine all of the files instead of using the normal Combine Files approach which uses a function called Table.ExpandTableColumn.

So, how does the Table.Combine function work? Good question! This function requires a List of Tables so it can append all of those tables into just one big table.

You can learn more about what lists, tables and other structured values are inside of Power Query and the M language from here.

Let me show you how this new pattern would look like, but please take in consideration that the end goal is the same every time, but the journey  to it might be different depending on where you’re starting at and what “checkpoints” you need to make.

Step 1: Connecting to the Folder

Similar to what we’ve seen before, we’ll be just connecting to the folder:

You can download the files that I’m using by clicking on the button below:

Just connect to the folder where your data is being stored and you’ll get this window:

We now click the Edit button and head over to the Power Query Editor. Remember, we’re taking the same assumptions and notes that we did in the previous post (post 2).

Up until now, we’re basically doing exactly the same that we’ve done before, but things are about to change now:

Step 2: Buffering the Binaries

Since we’re working with files, is usually a good idea to buffer them (save them in memory) so it’s easier / faster for us to read the files and combine faster and for that, I’ll be deleting all of the columns except the Content column and then I’ll right click that column to do a drill down operation:

image

This operation that I’m doing will transform this column into a list. You can learn more about this process and the whole Navigation approach from this article.

After that, I’ll go to the formula bar and simply edit that so I can modify the formula to be like this:

= List.Transform ( #”Removed Other Columns”[Content], Binary.Buffer)

image

In essence, I’ve applied a function to each row in that list. That function is called Binary.Buffer and it does the caching of the file itself so things should run faster.

Step 3: Create a Custom Function

You’ve probably heard me saying the words “recipe” and “custom function” interchangeably as the whole idea of a custom function is that you’re creating a recipe on how to “cook” your data with some “ingredients” (parameters) as the input.

In this case we’re going to be creating our own custom function which would look like this:

(myFile as binary) => Table.PromoteHeaders( Csv.Document(myFile, [Encoding=1252]) )

image

I highly recommend that you read my series on Parameters on Functions (here) to find out more on how a custom function could be created.

The only thing that this function does is interpret the binary as a CSV file (using the Csv.Document function) and then promoting the values from the first row of that file to be the column headers.

Step 4: Invoking the Custom Function

Now with your custom function created, you can now invoke that function against our primary query where we buffered the binaries.

You’ll end up creating a custom step (by clicking the fx button on the formula bar) and then adding this piece of code:

= List.Transform( Content, fxRead)

image

and you’ll notice that now the list has Table values instead of Binaries.

Now we end up with a list of tables, which is exactly what the Table.Combine function needs and that’s the function that we’re going to use now:

Step 5: Using the Table.Combine function

Similar to what we did in the previous step, we need to create a custom step and now we’ll wrap the formula around the Table.Combine function and voilá:

image

All of the data is now combined in just 1 table and you can see how clean and straightforward this approach is.

I highly encourage you to test this approach against the Combine Files experience so you can time the difference specifically for you. It is really significant when you’re dealing with large volumes of data.

Here’s what the final code looks like:

let
Source = Folder.Files(“D:\Sales”),
#”Removed Other Columns” = Table.SelectColumns(Source,{“Content”}),
Content = List.Transform ( #”Removed Other Columns”[Content], Binary.Buffer),
Custom1 = List.Transform( Content, fxRead),
Custom2 = Table.Combine( Custom1 )
in
Custom2

image

Final words

I focused on using files from a folder, but this approach also works with any other data source. tables from a database, tables from a web service or any other data source of your choice. The end goal is to create a list of Tables using the method that you see fit and then use the Table.Combine function.

Share your stories with me! Have you tried optimizing your Combine Files query? how fast is it going now with this new approach?

Power BIPower Query
0 0 votes
Article Rating
Subscribe
Notify of
guest
33 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Claus Madsen
Claus Madsen
2 years ago

This was an inspiring post. I figured out a way to reuse this pattern when dealing which multiple SharePoint lists all having the same structure replacing Binary.Buffer with Table.Buffer ????

Content = List.Transform ( #”Removed Other Columns”[Content], Table.Buffer),

André P.
André P.
Reply to  Claus Madsen
2 years ago

Hey Claus,
I’m trying to do what you did.
Did you use the fxRead function as well?

Cheers.

André P.
André P.
Reply to  André P.
2 years ago

Nevermind, i understood now, and i managed to make it work 🙂

Thank you and special thanks to Miguel for this approach.

Cheers.

Eric Caron
Eric Caron
2 years ago

Hi Miguel, I do combine a lot of identical table objects from different xlsx files, the issue I face is performance…It take ages in my case when append the 60 files…Any proposal to get improve performance when combining tables object that are in xlsx files stored in the same SharePoint Library? (I saw your technique above works fine by creating a function to “transform” then combine csv files using buffer for improving performance).

Matan Baruch
Matan Baruch
2 years ago

Hi Miguel,

is there a way to write the function inside the same code for dataflow copy?

Matan Baruch
Matan Baruch
Reply to  Miguel Escobar
2 years ago

The code is fine. I just want to know if there is a way to combine the function query and the CSV import query together under one M code.

Matan Baruch
Matan Baruch
Reply to  Miguel Escobar
2 years ago

Can you please provide me the syntax as for this example?
I’m more familiar with DAX and less with complex M code.
I hope when you release the book at July I will upgrade my skill at M 🙂

Ricardo
2 years ago

Hi Miguel,

is there a way to keep the fila name column using this technique?

Jan Pedersen
Jan Pedersen
Reply to  Miguel Escobar
1 year ago

Hi Miquel

In “Step 2: Buffering the Binaries” you “drill drown” the “Content” column, this will keep only the ‘Content” column left.
But your video is about adding columns to “a-table-in-a-table”.

How can you keep or adding the filename after “drill down” the “binary” column?

Jan Pedersen Skive
Jan Pedersen Skive
Reply to  Miguel Escobar
1 year ago

I already tried to add a custom column to the table, after transform the binary list to a table. This will give ne add a column with a list of my filenames for each record. And not a single filename for each record.
Could you please guide me to only have one filename per record?

Jan Pedersen
Jan Pedersen
Reply to  Jan Pedersen
1 year ago

I can’t figure out how to do it.
I did in this way but have to skip your function “fxRead”

let
Source = Folder.Files(“\\thban1srv330\company$\Internal Export\XPO”),
FilteredAX_Rows = Table.SelectRows(Source, each Text.StartsWith([Name], “ax”)),
FilteredHiddenFiles = Table.SelectRows(FilteredAX_Rows, each [Attributes]?[Hidden]? true),
TransFormContentToTable = Table.AddColumn(FilteredHiddenFiles, “Custom”, each Csv.Document([Content])),
AddFileNameToTable = Table.AddColumn(TransFormContentToTable, “Custom.1”, each Table.AddColumn([Custom], “File Name”, (R)=> [Name])),
RemoveContentCol = Table.SelectColumns(AddFileNameToTable,{“Content”, “Custom.1”}),
BufferedTable = List.Transform ( RemoveContentCol[Custom.1], Table.Buffer),
TableCombine = Table.Combine(BufferedTable)
in
TableCombine

André P.
André P.
2 years ago

Hey Miguel,
Is this approach still good if we fetch the files with Web.Contents or is it just for files?
I’m afraid my all my requests are being buffered and it is taking a lot of extra time.
Cheers.

André P.
André P.
Reply to  Miguel Escobar
2 years ago

Got it.
Thank you for always answering!

Matan Baruch
Matan Baruch
1 year ago

Hey Miguel,

When I’m trying to put the fnCSV_Read and the Combine process together in one code (Query), it works on the PowerBI desktop version but getting error on the service via dataflow (New Entity). why is that happening and how can i fix it?

let
fnCSV_Read = (MyFile as binary) =>
Table.PromoteHeaders(Csv.Document(MyFile, [Delimiter=”,”, Columns=4, Encoding=1255, QuoteStyle=QuoteStyle.None]))
in
let
Source = Folder.Files(“C:\Users\Matan\Desktop\Sales”),
#”Removed Other Columns” = Table.SelectColumns(Source,{“Content”}),
Content = List.Transform(#”Removed Other Columns”[Content],Binary.Buffer),
Custom1 = List.Transform(Content,fnCSV_Read),
Custom2 = Table.Combine(Custom1)
in
Custom2

Joanne Chung
Joanne Chung
1 year ago

Hi Miguel, Thank for the great post, I am just wondering, how do I include some of the files attribute to the actual table. e.g if I need the date time stamp of the files creation as a column of my table?

cem
cem
1 year ago

Hello Miguel,

I tried to use excel (.xlsx) files instead of csv but after “Step 4: Invoking the Custom Function” instead of a clean table it shows me ascii characters and only one column.
Is it because of the “Step 3: Create a Custom Function”? in your code you use “Csv.Document” but I couldn’t find the equivalent of it for excel files.

regards

Jim
Jim
5 months ago

How would the rxread function work with text file with no headers and fixed width, or pipe delimited file?

Shawn
2 months ago

How would the fxRead function change if you wanted to skip a row then promote headers?

Chloe Strasser
Chloe Strasser
2 months ago

Hello Miguel. Your website is great. I have been trying to 1) add a new column with the date the file was last modified as a row to my tables 2) as well as clean up the data by transforming a sample file. I can do 1) or 2). I don’t know how to do both in the powerbi dataset. Lastly, when I try and link it to a table in powerbi it says I don’t have it in table format. Do I have to convert binary to table? Thanks from your nonpower expert, but trying her best to learn.

https://www.youtube.com/watch?v=GgwXt4LVmsU&t=624s
https://www.youtube.com/watch?v=RAmSdxt863s