Back Story: Back in the day, I used to work for this company with a truly all-star / amazing team. We were all part of this company called 20Th Century Fox, and we were in charge of the Theatrical distribution of films in Central America.
Business Intelligence wasn’t a standard in the industry – at least not how we know BI nowadays, and I was the main person leading the charge of using BI tools to analyze the sales aspect of the business – specifically the Box Office. And that’s how I found PowerPivot (without the spaces back then) and the DAX language. I went deep into it and even took the online Power Pivot Workshop that Marco & Alberto used to deliver back in December of 2011 .
I was able to implement a WHOLE bunch of reports, analysis and dashboards thanks to Power Pivot, but one thing that I couldn’t implement back then (because I started the journey with my own company), was a Comparison of “Similar” movies for the release planning of an upcoming movie.
See…before a film gets to a certain theater / cinema / studio / location, it has to go through a certain planning phase. It starts with an estimate on how much that movie might be able to make, but….how can we tell?
Well, you had a few options, but in the end it was all a manual approach where you’d need to think about what movies might be similar to this one (from every single perspective that you can imagine). Then you’d go through the system and search for every single similar movie one by one so you can grab what was the total gross revenue for each.
What if…you had a table for all the similar movies (usually called Comps) and you could simply select the Movie that you’re interested in and it’ll display all of the Comps?
Of course, this doesn’t only apply to movies, but to any product / service that you’d like to compare against similar products.
This is where this blog post comes in!
The Data Source
To replicate this scenario, we’re going to be getting Data from BoxOfficeMojo.
I went ahead and created some web-scraping queries to grab the Data from the following pages:
- The main movie page (url) – this will give me the general information about the movie like domestic total gross, release date, run time, budget, genre, and more
- Daily Box Office data (url) – this is all Domestic US data, but it can give me information about how was the opening day so I can compare the opening day / week / month of multiple films.
- Similar Movies (url) – this is where I can get all of the movies that are similar to the one I’m after.
I used Power BI to grab all the data from multiple films published on BoxOfficeMojo. If you’d like to see how I made that happen, leave a comment in this article. I’m curious in seeing how many people are interested in that.
OK – so I’ve placed all of this data inside of an Excel, so it’s easier to distribute and you can download that file as well as the final PBIX from the following button.
The Data Model
Usually, most people are trying to understand DAX before anything else, but I’d rather point people into knowing more about Data Modelling before going into more advanced DAX scenarios.
Close to 70% of the scenarios that I’ve faced, that were originally labeled as “Advanced DAX”, were simply solved by a better modelling technique – usually with the use of Power Query to shape the data in the most optimal way. See, Power Query is a great tool to shape your data and one of the cornerstone tools when it comes to Data Modelling with Power BI.
This is why we’re starting this with the Data Model.
Loading the Data
This is the easy part. Create a new Power BI Desktop report and connect to the Excel file that you just downloaded.
As shown in the image above, you need to select the 3 tables (not the worksheets) and load those 3 to your Data Model.
Creating the Relationships
The first thing that we need to do is simply load the 3 tables from the Excel workbook into Power BI. That’s the easy part, and then comes the also easy part of creating the relationship between tables.
The BOM_FIlms table from the Excel workbook is basically a dimension table of all the movies that we have. It’s primary key field is the one called ID.
That same field appears in the Daily Box Office table as a foreign key under the same field name of ID. Here, our Daily Box Office table is what we call a Fact table, because it has mainly factual data – daily sales (box office) for each movie.
Then we have the Comps table which looks like this:
and you can probably tell that both fields here are basically using the same field from the BOM_Films table. They’re both the ID of the film, but which one should we use for the relationship?
In this case we’ll be using the id field from the Comps table which will have a relationship with the ID field from the BOM_Films table.
Adding a new Table
As you can see from the video above, we want to search for the movie that will be our “main” film and see what similar movies it has. To make that happen, we can’t use the same field from the same table in both visualizations, so that’s why we need to create a new table which will have the ID and the name of the Film.
Instead of creating that with Power Query (and adding more calls against BoxOfficeMojo), we can create a simple Calculated Table like the following:
Let’s call this new table Selection.
Now we have 2 choices:
- We can create a solution with that table as a disconnected table and use something like KEEPFILTERS to pass that selected value to our filter context or
- Create a relationship between that new table and the Comps table
Both can get the job done, but in this case I’m going with the easiest solution for me, which is to create a relationship of that new table with the Comps table using the ID from the new Selection table against the Comp column from the Comps table.
Here’s a screenshot of what all of the relationships should be looking like:
Right now, all we need is to create the following measure:
Selected = CALCULATE( DISTINCTCOUNT( Comps[id]), BOM_Films)
once you create that, you can create a new report page using the Film Name from the Selection table as a slicer/filter, and then another visual using the newly created measure and the Film Name column from the BOM_Films table which will end up looking like this:
We can add a new measure like this:
Daily BOR = CALCULATE( SUM( ‘Daily Box Office'[Gross]), Comps, BOM_Films)
Which yields the Daily Box Office data for only the similar movies of the selected movie:
Making it look good: An art form
Not only does the Film need to look good, but this report also needs to look good. In the end, it’s all about creativity and what message we’re trying to convey through that report.
I created the following report with 2 pages that answer 2 main questions:
- What are the similar movies for the movies selected
- What’s the performance of those similar movies within their x days (to track opening day and more)
One of the great things about the new web-scraping experience with Power BI Desktop is that you can get to the attributes of the HTML page using CSS Selectors, which means that I’m able to get to the src and even the href pretty easily. In simple terms, I can grab the url of the poster of each movie with ease and you can see that field in the BOM_Films table under the image column.
You have to go to the Data view inside of Power BI Desktop and define the properties of that image column to be a Image url:
and once you do that you can actually add that field to your visual and it’ll display the poster of that movie:
Now, how can we make this look good? I had to do some research as to what was the best set of visuals to use in this case.
In the end, I chose to go with 2 custom visuals:
- Facet Key – to make the film search experience a bit better
- Card Browser – to display the similar comps in the most clean way whilst still showing the posters in their full form.