Miguel Escobar Published September 30, 2021

Stacked columns to Table

Power BIPower Query

Recently, I’ve been trying to be more active through more diverse set of communities around Power Query and the Data Preparation world.

I’m still not a Microsoft employee, so I’m trying to take this time to learn.

While navigating through Reddit, someone brought my attention to this thread which happens to be a challenge.

During my first try, at some time around 2am my local time, I created a solution for what I thought was the requested solution. It turned out that my brain doesn’t want to read things thoroughly at that time of the night, so I ended up creating a solution for a completely different problem which was far more interesting than the original one.

Note: The solution to the original problem is something that’s actually covered in the book that Ken Puls and I wrote which you can get here.

This blog post will focus on that more interesting scenario where you have a table with a column of stacked data, which just so happens to be a column of data on top of another column of data.

The scenario

Imagine a scenario where you have a table that looks like this:

And you want that to change to look like the following:

Something really crucial about this example is that the columns are not symmetrical. Some of them might have more rows than others, but they all of them are expressed in the same order – any missing rows should be treated as nulls and that’s why you see the last row on the Name column expressed as a null value.

This is what makes it difficult – we can’t follow a regular pattern like the ones that you’ve seen in the book. You need to resort to the M language to see how things could work. That’s what this blog post will show.

Step 1: Split the column into two

The cool thing about this scenario is that the values in the column do provide some really helpful information which is the name of the column each value should be assigned to.

I’m on a Mac, so I’ll be doing all of the following steps using the Power Query Online experience inside Power BI Dataflows.

To split the column you just need to right click the column that you want to split and select the option Split column > By delimiter:

After doing that, the table should now look like this:

As a suggested step, I recommend that you wrap the code for the step around a Table.Buffer to preserve the order of the data.

Step 2: Group the values by the name of the column

We can use the name of the column from the Column1.1 to group the values from what would become the columns. To do that, you just need to use the Group by operation using the configuration from the image below:

The result of that operation will look like this:

Step 3: Extract just the values into a list

This next step is something that you could actually backwards implement in the previous step, but I’m showcasing it here to better explain the thought process behind this solution.

Since each table value in the MyRows column contains both the name of the column (in Column1.1) and the values of the column (in Column1.2), we want to extract the values from the Column1.2 column and expose them in a list.

To make that happen, simply go to the Custom Column and use function Table.Column to extract that column from the table.

You can then hit the OK button and the result will be a new column with List values as shown below:

You can now delete the MyRows column as we won’t need it anymore

Step 4: Creating a table from a list of columns

In the M language there’s a number of M functions or keywords called constructors. You may have seen some of them in the book such as #table, #date, #datetime and such. There’s also a couple of functions such as Table.FromColumns and Table.FromRows which are functions that try to create a table based on the values provided.

In this case, we will be using Table.FromColumns which just so happens to need a list of lists in order to create the table.

You can hit the fx button in the formula bar to add a new custom step and then add the code:

Table.FromColumns( #"Removed columns"[MyColumns] )

And that will create the table using only the values from the lists:

but wait, what happened with the column headers? Well, the Table.FromColumns does accept a second argument where you can define the columns however you want.

Let’s modify the previous code so we can add that second argument and tell the function exactly the names of the columns:

Table.FromColumns( #"Removed columns"[MyColumns], #"Removed columns"[Column1.1] )

This is how that would look like:

The second argument accepts any type of column definition. In that sense, you could potentially pass a table type definition with the name of the columns and their respective data types so you don’t have to add the data types later

Conclusion

While this is a pretty bizarre scenario, I can imagine some systems outputting log files in this manner where something like Table.FromColumns or even Table.FromRows could be the best route to take.

Unfortunately, none of these functions scale well. When you have to deal with hundreds of thousands or millions of rows, taking this approach will take quite a few minutes to process. There are always other alternatives, but they are usually made taking a more investigative approach as to how your data is structured and trying to find patterns or things that could be used as leverage on your solution.

Below is the full sample code of the query created in this blog post:

let
  Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8kvMTbX1zUwvTc1RitWB8r1T8xAcp8wciJRzfmleSVGlbUBiXmJuIoqQM1AoBVUoNNgRhe+bWpGZnA8WCkvMKU21NTRA4hghc4yROSZATiwA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
  #"Split column by delimiter" = Table.SplitColumn(Source, "Column1", Splitter.SplitTextByDelimiter("="), {"Column1.1", "Column1.2"}),
  #"Grouped rows" = Table.Group(#"Split column by delimiter", {"Column1.1"}, {{"MyRows", each _, type nullable table[Column1.1 = nullable text, Column1.2 = nullable text]}}),
  #"Added custom" = Table.AddColumn(#"Grouped rows", "MyColumns", each Table.Column([MyRows], "Column1.2")),
  #"Removed columns" = Table.RemoveColumns(#"Added custom", {"MyRows"}),
  Custom = Table.FromColumns( #"Removed columns"[MyColumns], #"Removed columns"[Column1.1] )
in
  Custom
Power BIPower Query
5 2 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments