2017-09-17

Power BI/Power Query: API Response Pagination Part 2 - Paginate by Page Cursor

This is the 2nd post of the topic API Response Pagination in Power BI/Power Query. You can find the first post here.

In the first post we have seen an API pagination example in Power Query, in which the API response provides us with the total record count, then we can calculate amount of pages we need to query. So today let's see another typical scenario in API response: the response contains an cursor pointing to the next page. (Here the cursor could be a direct link brings you to the next page, or it could be a page token you need to apply to your next API call).

The example I demonstrated today is Youtube Search API, and the channel I am going to search is VicRoad (as this post is focusing on result pagination, I am not going to explain how to prepare the API call :P). Below is a well-formatted Json response returned from the call. As you can see, the key field (the cursor) we are going to use is "nextPageToken".



Similar to the last post, the first step is preparing a base call function, as shown below. However, this time we return a list, which contains the result set of the current call, as well as the nextPageToken field we can use for drilling calls.



Now here is the core function of today's topic: List.Generate(). Basically you can treat this function as the FOR loop: we initialize a variable, give it a stop condition, specify the incremental action to the variable, and finally populate the return list result. Below is what I wrote:

So what we did is quite clear now: we create a helper function to call the API; Inside the List.Generate() function, we call the function to repeatedly populate records on different pages by using nextPageToken value. So what we need to do is just turning the result to table? Hold on for one second!

Let's review this piece of codes. So initially we call the API without page token, and the return result set contains records from page 1 and token for page 2, hence the second call gives us records from page 2 and token for page 3...


Initial TokenResultsetNext Token
""page 1page 2
page 2page 2page 3
page 3page 3page 4
page 4page 4page 5
.........
last pagelast pagenull

Now you see the problem: our condition for generating the list is "for each NOT NULL nextPageToken". We haven't gone through the last page!

Fixing the problem is quite simple, we can just manually query the last page, and then combine the result sets:

Put all together:

let    
    ufnGetList = (pgToken) =>
        let 
            result = Json.Document(
                        Web.Contents("https://www.googleapis.com/youtube/v3/search?part=snippet&channelId=UCkT7-rjW_Foojt0ZRtbXMdA&maxResults=50&type=video&key=[put your api key here]&pageToken=" & pgToken)
                        ),
            nextPageToken = try result[nextPageToken] otherwise null,
            items = result[items],
            record = [items = items, nextPageToken = nextPageToken]
        in 
            record,

    resultSet = List.Generate(
                    () => ufnGetList(""),
                    each _[nextPageToken] <> null,
                    each ufnGetList(_[nextPageToken]),
                    each [nextPageToken = _[nextPageToken], items = _[items]]
                    ),

    lastPageToken = List.Last(Table.FromRecords(resultSet)[nextPageToken]),
    lastResultSet = ufnGetList(lastPageToken)[items],

    fullResultSet = Table.Combine({Table.FromRecords(List.Combine(Table.FromRecords(resultSet)[items])), Table.FromRecords(lastResultSet)})
    
    
in
    fullResultSet


Now Run the query we should have all the records we need.



2017-09-03

Power BI/Power Query: API Response Pagination Part 1 - Paginate by Item Count

It has been a while since my last post, lots of things happened in these few month, changed my career path, obtained my certification... Finally I got a chance to sit down and restart the blog. So today's topic is pagination in REST API response.

As you can see in the past I had done a few posts of how to consume API results in SSIS via script components, such as this post. But in Power BI, it might not be as easy as what we did in the script component, because there is no direct way to do a loop when reading paginated response.

Generally there are two types of paginated result set:

-- Good: A Json response contains an indicator to provide you with the information of the next page, it could be a link for the next page, or it could be a token for you to issue the next call, or it could be a total item count so you need to do a calculation of how many pages you need to go through. This is today's topic.

-- Cursed: A Json result set does not have pagination information, so you need to query next page every time until the the response gives us nothing.

So, let's prepare today's post here. To demonstrate the logic, I am using the v3 SET (/api/v3/lego/sets) API from rebrickable. The API itself is quite simple, but you need to create an account, then generate a API key from your account profile to call the API.  It is a simple process so I won't cover it here.

After we obtain the API key, it is time to open the Power Query window in Power BI



From opened query window, select Web as the data source



In the pop up window, select Advanced option, and then input API endpoint and key value, as you can see below, I set page size to 100, and the initial page is 1.


Click the OK button and the query should run automatically and give you the result. As you can see the response is quite clear: it provides you total items in the result set, as well as the link you should use to visit the next page. In this example, I am going to use the "count" indicator to do the loop.


Now Power Query has built the basic routine for us. We need to manually modify the query
to return the result page by page. So firstly we open the query edit window by click Advanced Editor


Our query looks like screenshot captured below


In the query editor window, I transform the query into a function, nothing special, it simply accepts a page number and then return the Json response.


Here is the key part:
  • We pull the total item count from the initial call 
  • Then we generate a list for page numbers 
  • For each list item, we call the ufnCallAPI function

And the result returned from the query:

(note: 100/page might be too small to issue the call, if it is too slow, you can try to change the page_size parameter to 1000 in the function definition, as well as the pageRange list boundary to {1..Number.RoundUp(totalItems/1000)} )


As you can see, though there is no loop function available in Power Query (at this stage), we can still retrieve paginated API responses by using provided paging indicator. In my next post, I will cover the topic of how to handle no indicator response in Power Query. Enjoy.