Why is it so complex to connect to an API with Power BI or Power Query? The OAuth explained
It’s easier to explain using analogies, so let’s go with that!
Analogy: Imagine that you have a gym subscription and this gym also has a VIP lounge, but to get to the lounge you need to head to the front-desk to get authorization for a limited time.
Using MailChimp as an example:
- The gym is actually MailChimp – they put all the facilities for you
- The lounge section is their API and its where you can “access” your data
- The front-desk authorization is what we know as a Authorization Token, which grants us the chance to “be” in the API
I’m overly simplifying things here, but the main idea is that the “vip lounge” is the place that the “gym” gives you to do what they give you access to do. They (the gym) could give you access to read, write, delete or modify “data”.
Now, here’s the interesting part. Inside the VIP lounge, there are a number of doors that lead to other “places”, like a locker room, massage room, cafeteria and other facilities. These are what we call “endpoints” – places inside the actual “VIP lounge” where you can do specific actions.
Now, in the more technical sense. The “VIP lounge” address can be “https://us10.api.mailchimp.com/3.0”, but the exact address for that massage room inside the cafeteria could look like this “https://us10.api.mailchimp.com/3.0/massage-room” – so if we wanted to get straight into the massage room through Power BI or Power Query, we could create a web query to that specific url and inside the Header we would add the Authorization Token that we got from the front-desk.
You’re probably wondering, how do I get my Authorization Token from the Front Desk? Well, depending on the API service, you can simply use something called an API Key or use OAuth. We’ll go with OAuth in this case and, in order to get the authorization, you need to be authenticated through the Web Form that the gym gives you AND also be able to handle the response that they give you in the format and shape that they give it to you, which is different in every gym. You could do this part outside of Power Query and the Power BI Desktop, in your own browser, and then paste the Authorization Token as a new parameter in Power Query, that would work, but what happens when that Authorization Token expires and you need a new one?
So, how do I make this automatic? How can I get new Authorization Tokens on demand? – well, I forgot to tell you that besides giving you an Authorization Token, the Front-Desk usually gives you something called a “refresh token” so that once you’re inside the VIP lounge you can get a new Authorization Token (and a new Refresh token as well) right within the VIP lounge.
Here’s the full process that you need to go through in a normal API:
- Get to the front-desk and ask for authorization and refresh token through the web-form where you need to enter your credentials
- Head over to the specific endpoint inside the “VIP Lounge”
- Submit a GET or POST request (if you’re at the Cafeteria inside the VIP lounge, basically you’ll be requesting a beverage)
- GET = request data to be sent to you only
- POST = send data to the service and expect a response from the service
What are the limitations of Power BI Desktop & Power Query as of January 2017? the only limitation right now is the web-form in OAuth 2.0 and how we can’t set up that workflow. Now, we as end-users can’t do it, but the Power Query team can do it because they have access to other extensibility tools and that’s how new connectors for MailChimp, Google Analytics and such are being created.
We might be able to have the ability to create our own connectors at some point and I can see that being something amazing for Power Query in SSIS, for example, where you can create your own connector using Power Query and use the result of that query as a data source in SSIS. It’s just a matter of how or if the MSFT team can create a toolkit for developers and advanced Power Query users to create their own connector.
You can read more about the RESTful API here.
Using one of the API Connectors inside the Power BI Desktop
This blog post will cover how you can optimize your queries with the connectors available in the Power BI Desktop, but in part 2 I’ll show you how you can connect directly to the Youtube Analytics for Partners so you can extract your daily ad-revenue, views and more.
I’ll be using the MailChimp connector as an example, so before you begin, and in case you want to follow along, you’ll need the following tools:
- A MailChimp Account
- The Latest version of Power BI Desktop
- Fiddler (Optional
Head over to the ‘Get Data’ experience inside Power BI Desktop and select “MailChimp” from the available connectors. You’ll be prompted with a window where you’ll need to enter your credentials. This is the OAuth 2.0 happening right in front of your eyes. Next thing that you’ll see is a window called “Navigator” where you can see all the options that you available from the connector:
You might notice that there appear to be 2 folders and then 2 functions:
- Campaigns Folderthis folder contains pre-defined queries created by the MSFT team that will get you data related to the campaigns. You can take them as ready-to-consume API GET requests
- Lists Folderthis folder contains pre-defined queries created by the MSFT team that will get you data related to the Lists. You can take them as ready-to-consume API GET requests as well
- MailChimp.Collection functionInstead of a pre-defined or pre-packaged query, this is just a simple function that has the OAuth workflow embedded so you can create your own GET, POST or any kind of requests that you need to do. The only input needed is the name of the MailChimp Collection, for example, Lists or Reports
- MailChimp.Instance function similar to the MailChimp.Collection function, but in this one you can point towards an specific element of the API, like the email activity endpoint here
Go ahead and select the Campaign Summary option from the Campaigns Folder:
this query will give you a table with all the campaigns for every lists that you have in your MailChimp Account. I want to focus on just one list as this report will be focused on the English side of my business instead of the Spanish one, so I can filter this table by list ID and that way only get campaigns from that specific list. I could even go further and only select. In the end, I’ll use this table as a Dimension table in my Data Model and I’d also be using this table to make other calls to the MailChimp API Service.
After doing a few clean ups, I finally ended with a simple Campaigns table with only 2 columns. The ID of the campaign and the timestamp when it was first sent:
I’m going to reference that Campaigns query and then try to add a new column using the MailChimp.Instance function to make my own GET request to the email activity endpoint. When I try that, I immediately get an error like this:
(a formula firewall – we can’t combine 2 different data sources this way if their privacy levels are different)
Fortunately, there are 2 ways to fix this:
- Head over to the File Menu > Options and Settings > Options > Privacy tab> “Ignore the Privacy Levels and potentially improve performance”
- Create everything in just 1 query and then simply create a duplicate or a reference and delete the duplicates
Before you make a choice….
Here’s a few things that you need to take in consideration:
The Power BI Desktops comes with “Parallel Loading of Tables” enabled in his options. This is mainly for query folding against relational databases, but in our case this might impact us in a really bad way.
Why would it affect me? well, imagine that you have a query that makes 20 calls to the service. You reference this query and, since this is running in parallel and not in series, you are now making 40 calls to the service. What if in this referenced query you need to also make a # of new requests? then you’d need to multiply the 20 calls of your current query times the new calls that you’ll be making per row. This would be sure to time you out OR, even worst, reach your api calls quota limit in no time before even putting your report to use.
Even when you disable this Parallel Loading options, you sometimes get parallel processing in your queries which might cause you to make undesired calls to the API service and this is why you need to use Fiddler to monitor your API calls and check if your solution is working as intended.
My recommendation would be to leave the Parallel Loading of Tables enabled and simply make all the calls needed in just 1 query so you don’t get hit by the Parallel processing running the same query multiple times
You should really check out this thread from someone that had some issues when dealing with Web API requests and the response from MSFT.
In short, here a list of some recommendations:
- Use Function.InvokeAfter to add a timer to your calls – something among the lines of make a call every 2 seconds so you don’t get timed out by multiple connections at the same time
- Caching and Lazy evaluation on the response from the server – use Binary.Buffer() against the Binaries that you get from the service so we get a local copy of the binary and not repeat the same call over and over again
- Use Fiddler or the service log to see how many calls you’re making – try to see how much time you spend on the handshake vs actually getting the data. Throttling can be a performance issue, too many calls at the same time and you might get timed out as well. This would require you to check your query plan
After spending quite a few weeks with the MailChimp API, I’ve reached the conclusion that the way to go would be to create all your calls within one query and then create new tables in your Data Model using DAX
Click here to check the Part 2 of this blog post on how to connect to the YouTube Analytics API with a Power BI Custom Connector (which uses OAuth 2.0 as an authentication method).
Update May-19-2018: Power BI Custom Connectors are now possible! You can create your own connector and connect to any data source of your choice using authentications like OAuth 2.0 – more information here
I am having many problems with the API of JIRA.
The online update of my Power BI dashboard gives me the following error:
to combine data] Section1/Datos/AutoRemovedColumns1 is accessing data
sources that have privacy levels which cannot be used together. Please
rebuild this data combination.”
In spite of having used all the tricks that you indicate in this post.
Can someone give me some other indication?
that seems like the formula firewall error. This usually happens when you try referencing a query using a Reference, Merge or Append operation. Without looking at the query and how it’s being created is hard to make any suggestions. I’d suggest that you post your question with more details on the official Power BI forum to get support or even raise a new ticket to the Microsoft team from inside the Power BI service.
Create a ticket> You’ll need to create this inside your O365 Portal
Thanks for the answer Miguel (Gracias!)
I have put my questions in the Power BI forum but I didn’t find the answer that solve my problem. I found this post and I have followed all the steps without success. I have modified the JIRA content pack to get information of every issue and that is where my problem comes from.
First, I get the list of issues with the same function of the JIra Content Pack. And after that, I use another function to get the info of every Issue, like this:
My only suggestion is that you might want to try and check the code for ‘FetchPages’ in that code. If ‘FetchPages’ is a reference to another query, and you’re using this function outside of the context of the original query, then that’s the culprit of the problem. If you can define everything within one query, without ever needing to define a function within that query, then you’ll be hitting gold. Also, don’t forget to set up the privacy levels of the data sources – perhaps they are undefined and that’s the simple reason why you are unable to combine the data. If your end goal is to publish your solution to Power BI, then do not ignore the privacy levels.
I wish that I could be of more help, but I’m completely swamped for the next few days. If you can, leave the link to the thread on the forum here and I’ll try and take a look at it once I have more availability.
Thanks for the answer and thanks for the ideas. I’ll try and tell you the results.
P.d.: Igual en español nos entenderíamos mejor? 😉
claro! si tienes oportunidad, comparte el enlace al thread conmigo para poder darle un vistazo y compartirlo con mis colegas dentro y fuera de Microsoft que tal vez te podrían ayudar.
Here is the link to the thread in Power BI Community:
Thanks for your time!
I need to connect to Google Flights / QPX Express, and am not sure this is doable at all within Power Query?
Your opinion would be much appreciated!
From what I’ve read here:
It is entirely possible to accomplish this in Power Query either with an API Key or with OAuth (and the creation of a Custom Connector) authentication. From my experience creating solutions against Web APIs, I can tell you that the hardest part is actually reading through the API documentation and truly understanding how everything is laid down in order to create your desired query.
If you are trying to create a custom connector against the Google APIs then this article might help you:
I also provide consultancy to help you create either Custom Connector or solutions over the Native Web Connector that Power Query offers. If you’d like to inquire more about those services, please send us a message using the contact form found on this link: https://www.thepoweruser.com/contact/
Hey Miguel, i am trying to implement Binary.Buffer like you said. How can i see if it made any difference? My queries are a bit slow, same as before.
The buffer makes the files be stored in memory (RAM), so you might see a spike in your memory usage. If it has a good or bad impact on your query, it depends on your data source and also on how your query does after the buffer
Is it just me or has the mailchimp connector disappeared from Power BI?
I believe it was deprecated at one point.
Thanks for writing the blog with clear instructions.
I have been trying to create a custom connector to zoho CRM with powerBI. I have been able to enter the login credentials as well of crm , but struck on the below error.
Error message: [DataFormatError] We couldn’t parse your query string as it was improperly formatted
Can you help me in any clues around this problem as I could find similar posting on internet but no solution around it.
The best thing to do is always to monitor your calls with something like Fiddler to see exactly what you’re requesting the API and then check the raw response from that API.
Usually that error is in relationship to the query parameters of your url not being formatted correctly.