2018-09-25

Power BI/Power Query: Load JIRA Issues in Parallel

For nearly a year I haven't done any posting, as I was doing some works not relevant to SQL things, AR/MR with Unity, Machine Learning/Deep Learning, and Flask development. Now I am back to data related works so hopefully I will restart blogging my data experience. Let's see what we can do today.

Due to performance issue, JIRA made the decision to limit amount of records returned from its Rest API to 100 (here). But what if we are working on a large number of issues? like this



So we need to make the API call for more than 200 times. So depends on business requirement, the Power BI report could be just too long to refresh.

An alternative way to handle large amount of issues is we call the API in parallel. To do that, firstly let's create a basic function ufnGetJson to call the API (replace the Organization place holder with correct value):

 (nStartAt) =>  
     let   
       query = Json.Document(Web.Contents("https://{organization}.atlassian.net/rest/api/2/search?maxResults=100&startAt=" & Number.ToText(nStartAt)))  
     in  
       query  


Then based on this ufnGetJson function, we can get total amount of issues and split the stream to multiple batches (5 batches in below example)

 ufnGetRange  
 (n) =>  
     let  
       tmpResult = ufnGetJson(0),  
       totalItem = tmpResult[total],  
       cnt = totalItem / 5,  
       returnRange = {(n-1) * Number.RoundDown(cnt)..n * Number.RoundUp(cnt)}    
     in  
       returnRange  


Code listed above provides us with a boundary of the issue numbers we need to call. Now we just need to create 5 datasets and call the function separately, but keep in mind, we need to provide different n value. e.g. to get the first batch we can have the query like below

 let  
   rng = ufnGetRange(1),  
   startAt = List.Min(rng),  
   boundary = List.Max(rng),  
   pages = {Number.RoundDown(startAt/100)..Number.RoundDown(boundary/100)},  
   pageRange = List.Transform(pages, each _ * 100),  
   paginated = List.Transform(pageRange, each ufnGetJson(_)),  
   #"Converted to Table" = Table.FromList(paginated, Splitter.SplitByNothing(), null, null, ExtraValues.Error)  
 in  
   #"Converted to Table"  

The last step is simply creating a new dataset by combining all 5 datasets into one.


 let  
   combined = Table.Combine({#"Issue Batch 1", #"Issue Batch 2", #"Issue Batch 3", #"Issue Batch 4", #"Issue Batch 5"}),  
      ....  
      ....  
      ....  




At my end this approach improves loading speed a lot, but obviously it depends on connection capacity. Let me know if there is any problem :)