2018-11-29

Power BI: Parameter Table and Dynamic Date Range

As we know Power BI is all about filter context, which brings us quite a lot convenience. But for some cases, it might not be what we want. For instance, what if I want to show amount of sales for last N months by a calendar month slicer? Because the slicer puts a constraint on the calendar month, you won't be able to show sales amount month by month. As illustrated below, how can we see total sales for 2012-12, 2012-11, etc.?





Normally what we can do is creating a disconnected date dimension, and using this disconnected date dimension for the slicer. But sometimes it could be very confusing to end users. So instead of creating a second date dimension, we can try to use date member from the fact table directly. So let's start today's topic:

Firstly the demonstration I built is from AdventureWorksDW2014.

To make it simple, I only loaded DimDate and FactInternetSales tables. Then, I removed default relationships and create the date relationship between FullDateAlternateKey and OrderDate columns directly.


Finally I created a custom column in the DimDate table for calendar month:

Calendar Month = FORMAT(DimDate[FullDateAlternateKey], "yyyy-MM")

Then we need a parameter table, which gives us control of how many months of sales user wants to see. In the real world scenario, we may need to create a junk dimension to hold these values. But in this demonstration, I just manually input the data.

Now let's create two simple measures:

Total Sales = SUM(FactInternetSales[SalesAmount])

Selected Period = IF(HASONEVALUE('Parameter Table'[Parameter Value]), VALUES('Parameter Table'[Parameter Value]))


If we follow normal steps to create visuals (create a slicer from 'DimDate'[Calendar Month], create a slicer from 'Parameter Table'[Parameter Name], then drop the table visual for calendar month and total sales), you will find the table visual won't show last N months sales, because the calendar month slicer puts a filter context to the fact table.

So let's make three changes to make it work:

1. add a calculate order month column into the fact table

Order Month = FORMAT(FactInternetSales[OrderDate], "yyyy-MM" )

2. add a measure for total sales of last N months. (note: Max function gives us the last date within the current filter context - the calendar month slicer value in this case)

Sales (in period) = CALCULATE([Total Sales], DATESINPERIOD(DimDate[FullDateAlternateKey], MAX(DimDate[FullDateAlternateKey]), -[Selected Period], MONTH))

3. now instead of using 'DimDate'[Calendar Month] and [Total Sales] for the table, we drag and drop 'FactInternetSales'[Order Month] and [Sales (in period)] into the table visual.



Easy job, isn't it? :D

2018-11-24

Pull Exchange email attachment via Exchange Service Binding

To be honest, I had never thought there would be a need to pull attachment from on-premise Exchange Server via Exchange Service Binding, in the year of 2018. But I was wrong :(

Basically we have two ways to read email from on-premise Exchange. The first option is utilizing Exchange Web Service managed API. It can be used for both on-premise Exchange server and O365 (an example I blogged before). The other option is today's topic: Exchange Service Binding, which by any means, should be avoid! Now let's have a look the basic routine:
  • find the root of the email account;
  • locate the base email folder (normally it should be the inbox folder);
  • pull emails from the base folder;
  • get email attachments' id from the email item;
  • pull attachments from email by attachment item id;

To locate the root folder of the account (NOT the inbox folder)

        public static string GetRootFolder(ExchangeServiceBinding esb)
        {
            DistinguishedFolderIdType[] dfit = new DistinguishedFolderIdType[1];

            dfit[0] = new DistinguishedFolderIdType();
            dfit[0].Id = DistinguishedFolderIdNameType.root;

            FolderResponseShapeType frst = new FolderResponseShapeType();
            frst.BaseShape = DefaultShapeNamesType.AllProperties;

            GetFolderType gftRoot = new GetFolderType();
            gftRoot.FolderIds = dfit;

            gftRoot.FolderShape = frst;
            GetFolderResponseType gfrt = esb.GetFolder(gftRoot);
            FolderInfoResponseMessageType firmt = ((FolderInfoResponseMessageType)gfrt.ResponseMessages.Items[0]);

            if (firmt.ResponseClass == ResponseClassType.Success)
                return ((FolderInfoResponseMessageType)gfrt.ResponseMessages.Items[0]).Folders[0].FolderId.Id;
            else
                return null;

        }


After we have the root, we can navigate to the specific folder by folder name, inbox, archive, deleted folder, whatever you like :)

        public static BaseFolderType GetFolder(ExchangeServiceBinding esb, string parentFolderId, string folderName)
        {
            if (esb == null || string.IsNullOrEmpty(parentFolderId))
                return null;

            FolderIdType[] fit = new FolderIdType[1];
            fit[0] = new FolderIdType();
            fit[0].Id = parentFolderId;

            FolderResponseShapeType frst = new FolderResponseShapeType();
            frst.BaseShape = DefaultShapeNamesType.AllProperties;

            PathToUnindexedFieldType ftFolderName = new PathToUnindexedFieldType();
            ftFolderName.FieldURI = UnindexedFieldURIType.folderDisplayName;

            ConstantValueType cvt = new ConstantValueType();
            cvt.Value = folderName;

            FieldURIOrConstantType ctFolderName = new FieldURIOrConstantType();
            ctFolderName.Item = cvt;

            ContainsExpressionType cet = new ContainsExpressionType();
            cet.Constant = cvt;
            cet.Item = ftFolderName;
            cet.ContainmentComparison = ContainmentComparisonType.IgnoreCase;
            cet.ContainmentComparisonSpecified = true;
            cet.ContainmentMode = ContainmentModeType.FullString;
            cet.ContainmentModeSpecified = true;
            RestrictionType rt = new RestrictionType();
            rt.Item = cet;
             
            FindFolderType fft = new FindFolderType();
            fft.Traversal = FolderQueryTraversalType.Deep;
            fft.ParentFolderIds = fit;
            fft.FolderShape = frst;
            fft.Restriction = rt;

            FindFolderResponseType ffrt = esb.FindFolder(fft);

            ResponseMessageType rmt = ((ResponseMessageType)ffrt.ResponseMessages.Items[0]);

            if (rmt.ResponseClass == ResponseClassType.Success)
            {
                BaseFolderType[] bfts = ((FindFolderResponseMessageType)ffrt.ResponseMessages.Items[0]).RootFolder.Folders;
                if (bfts.GetLength(0) > 0)
                    return bfts[0];
                else
                    return null;
            }
            else
                return null;
        }



Now we can get all email items from the folder we located

        public static List GetEmailItems(ExchangeServiceBinding esb)
        {
            List lstItems = new List();

            FindItemType fitRequest = new FindItemType();
            fitRequest.Traversal = ItemQueryTraversalType.Shallow;
            ItemResponseShapeType irst = new ItemResponseShapeType();
            irst.BaseShape = DefaultShapeNamesType.AllProperties;
            fitRequest.ItemShape = irst;

            DistinguishedFolderIdType[] dfits = new DistinguishedFolderIdType[1];
            dfits[0] = new DistinguishedFolderIdType();
            dfits[0].Id = DistinguishedFolderIdNameType.inbox;
            fitRequest.ParentFolderIds = dfits;

            FindItemResponseType fitResponse = esb.FindItem(fitRequest);
            FindItemResponseMessageType findItemResponseMessageType = (FindItemResponseMessageType)fitResponse.ResponseMessages.Items[0];

            var responseContents = findItemResponseMessageType.RootFolder.Item;
            var items = ((ArrayOfRealItemsType)responseContents).Items;

            if (items != null && items.Length > 0)
            {
                foreach (var item in items)
                {
                    lstItems.Add(item);
                }
            }

            return lstItems;
        }



Now here is a very important concept: if you google Exchange Service Binding, you will find the concept of the attachment in ESB is a bid confusing: the Attachments property returned from email ItemType is not the actual attachment itself. So we need to get the item id of the attachment, then we can get the actual attachment from the item id. Below is the code to construct a dictionary object for attachment id and attachment name:

        public static Dictionary GetAttachments(ExchangeServiceBinding esb, ItemIdType itemId)
        {
            Dictionary attachments = new Dictionary();

            GetItemType gitRequest = new GetItemType();
            gitRequest.ItemIds = new ItemIdType[] { itemId };
            gitRequest.ItemShape = new ItemResponseShapeType();
            gitRequest.ItemShape.BaseShape = DefaultShapeNamesType.AllProperties;

            PathToUnindexedFieldType hasAttachPath = new PathToUnindexedFieldType();
            hasAttachPath.FieldURI = UnindexedFieldURIType.itemHasAttachments;

            PathToUnindexedFieldType attachmentsPath = new PathToUnindexedFieldType();
            attachmentsPath.FieldURI = UnindexedFieldURIType.itemAttachments;

            gitRequest.ItemShape.AdditionalProperties = new BasePathToElementType[] { hasAttachPath, attachmentsPath };

            GetItemResponseType giResponse = esb.GetItem(gitRequest);
            ItemInfoResponseMessageType iirm = (ItemInfoResponseMessageType)giResponse.ResponseMessages.Items[0];

            if (iirm.ResponseCode == ResponseCodeType.NoError)
            {
                ItemType it = iirm.Items.Items[0];

                if (it.HasAttachments && it.Attachments != null && it.Attachments.Length > 0)
                {
                    foreach (AttachmentType attachmentItem in it.Attachments)
                    {
                        FileAttachmentType fat = (FileAttachmentType)attachmentItem;
                        if (fat != null)
                        {
                            attachments.Add(fat.AttachmentId.Id, fat.Name);
                        }
                    }

                    
                }
            }

            return attachments;
        }



Then we can get the attachment and save to the disk

        public static void GetAttachmentsOnItem(ExchangeServiceBinding esb, ItemIdType itemId, string destPath)
        {
            GetItemType gitRequest = new GetItemType();
            gitRequest.ItemIds = new ItemIdType[] { itemId };
            gitRequest.ItemShape = new ItemResponseShapeType();

            gitRequest.ItemShape.BaseShape = DefaultShapeNamesType.IdOnly;

            PathToUnindexedFieldType hasAttachPath = new PathToUnindexedFieldType();
            hasAttachPath.FieldURI = UnindexedFieldURIType.itemHasAttachments;

            PathToUnindexedFieldType attachmentsPath = new PathToUnindexedFieldType();
            attachmentsPath.FieldURI = UnindexedFieldURIType.itemAttachments;
            gitRequest.ItemShape.AdditionalProperties = new BasePathToElementType[] { hasAttachPath, attachmentsPath };

            GetItemResponseType getItemResponse = esb.GetItem(gitRequest);

            ItemInfoResponseMessageType getItemResponseMessage = (ItemInfoResponseMessageType)getItemResponse.ResponseMessages.Items[0];

            if (getItemResponseMessage.ResponseCode == ResponseCodeType.NoError)
            {
                ItemType item = getItemResponseMessage.Items.Items[0];

                if (item.HasAttachments && (item.Attachments != null) && (item.Attachments.Length > 0))
                {
                    List lstRait = new List();

                    for (int attachmentIndex = 0; attachmentIndex < item.Attachments.Length; attachmentIndex++)
                    {
                        FileAttachmentType fat = (FileAttachmentType)item.Attachments[attachmentIndex];
                        if (fat != null)
                        {
                            RequestAttachmentIdType rait = new RequestAttachmentIdType();
                            rait.Id = fat.AttachmentId.Id;
                            lstRait.Add(rait);
                        }
                    }

                    GetAttachmentType gat = new GetAttachmentType();
                    gat.AttachmentShape = new AttachmentResponseShapeType();
                    gat.AttachmentIds = lstRait.ToArray();

                    GetAttachmentResponseType getAttachmentResponse = esb.GetAttachment(gat);

                    foreach (AttachmentInfoResponseMessageType attachmentResponseMessage in getAttachmentResponse.ResponseMessages.Items)
                    {
                        if (attachmentResponseMessage.ResponseCode == ResponseCodeType.NoError)
                        {
                            FileAttachmentType fileAttachment = attachmentResponseMessage.Attachments[0] as FileAttachmentType;
                                                    
                            using (FileStream file = File.Create(Path.Combine(destPath, fileAttachment.Name)))
                            {

                                file.Write(fileAttachment.Content, 0, fileAttachment.Content.Length);
                                file.Flush();
                                file.Close();
                            }

                        }

                    }

                }

            }

        }

2018-10-28

Copy Azure Blob via Rest API

Recently I was requested to work on a data procedure. Basically what I need to do is using SSIS to copy files between different Azure blob storage folders (allow me to use this term, I know there is no such concept in Azure blob. It is just for simplicity :P ).

It is a quite simple task if you can reference Microsoft.WindowsAzure.Storage dll in your package. But in my case the SSIS server doesn't have it. So I decided to use Azure Blob Rest API -- independent to the host environment.

The Azure Blob API's office document gives us a very clear explanation. To call the API, we need to identify the resource, create authorization header, and then issue the web request. But the problem is, I couldn't find any resource to tell you how to create the authorization header for copy. So if you are facing the same problem, continue to section below...

The first step is to create a helper method to generate authorization header, very standard implementation:


        public static String SignThis(String stringToSign, string accountKey, string accountName)
        {
            String signature = string.Empty;
            byte[] unicodeKey = Convert.FromBase64String(accountKey);
            using (HMACSHA256 hmacSha256 = new HMACSHA256(unicodeKey))
            {
                Byte[] dataToHmac = System.Text.Encoding.UTF8.GetBytes(stringToSign);
                signature = Convert.ToBase64String(hmacSha256.ComputeHash(dataToHmac));
            }

            String authorizationHeader = String.Format(
                  CultureInfo.InvariantCulture,
                  "{0} {1}:{2}",
                  "SharedKey",
                  accountName,
                  signature);

            return authorizationHeader;
        }


The second step is to declare our variables


            string accountKey = "my acccount key";
            string accountName = "my account name";
   
            string srcBlob = "source blob name";
            string destBlob = "destination blob name";
   
            string containerName = "my blob container name";
   
            string method = "PUT";
            DateTime dt = DateTime.UtcNow;
   
            string reqUrl = @"https://" + accountName + ".blob.core.windows.net/" + containerName + "/" + destBlob;
            string copySource = @"https://" + accountName + ".blob.core.windows.net/" + containerName + "/" + srcBlob;
            string canonicalizedHeaders = string.Format("/{0}/{1}/{2}", accountName, containerName, destBlob);



Here is a very important step: we need to create the string for sign. Each line in this string has different meaning, and you can find detailed explanation here. Fortunately we just need to populate mandatory components, like this:


            string StringToSign = String.Format(
                method + "\n"
                + "\n" // content encoding
                + "\n" // content language
                + "\n" // content length
                + "\n" // content md5
                + "\n" // content type
                + "\n" // date
                + "\n" // if modified since
                + "\n" // if match
                + "\n" // if none match
                + "\n" // if unmodified since
                + "\n" // range
                + "x-ms-copy-source:" + copySource + "\nx-ms-date:" + dt.ToString("R") + "\nx-ms-version:2015-04-05\n"
                + canonicalizedHeaders);


Notice here the version I used is 2015-04-05. Technically you should be able to use any version Microsoft provided, e.g. the latest is 2018-03-28.

Now we have the string, then we only need to sign this string, and issue the web request


            string auth = SignThis(StringToSign, accountKey, accountName);

            Uri reqUri = new Uri(reqUrl);

            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(reqUri);
            request.Method = method;
            request.Headers.Add("x-ms-date", dt.ToString("R"));
            request.Headers.Add("x-ms-version", "2015-04-05");
            request.Headers.Add("Authorization", auth);
            request.ContentLength = 0;
            request.Headers.Add("x-ms-copy-source", copySource);

            using (var response = request.GetResponse())
            {
                var status = ((HttpWebResponse)response).StatusCode;

                if (status != HttpStatusCode.Accepted)
                {
                    Dts.TaskResult = (int)ScriptResults.Failure;
                }
                else
                {
                    Dts.TaskResult = (int)ScriptResults.Success;
                }
            }


Enjoy :)

2018-09-25

Power BI/Power Query: Load JIRA Issues in Parallel

For nearly a year I haven't done any posting, as I was doing some works not relevant to SQL things, AR/MR with Unity, Machine Learning/Deep Learning, and Flask development. Now I am back to data related works so hopefully I will restart blogging my data experience. Let's see what we can do today.

Due to performance issue, JIRA made the decision to limit amount of records returned from its Rest API to 100 (here). But what if we are working on a large number of issues? like this



So we need to make the API call for more than 200 times. So depends on business requirement, the Power BI report could be just too long to refresh.

An alternative way to handle large amount of issues is we call the API in parallel. To do that, firstly let's create a basic function ufnGetJson to call the API (replace the Organization place holder with correct value):

 (nStartAt) =>  
     let   
       query = Json.Document(Web.Contents("https://{organization}.atlassian.net/rest/api/2/search?maxResults=100&startAt=" & Number.ToText(nStartAt)))  
     in  
       query  


Then based on this ufnGetJson function, we can get total amount of issues and split the stream to multiple batches (5 batches in below example)

 ufnGetRange  
 (n) =>  
     let  
       tmpResult = ufnGetJson(0),  
       totalItem = tmpResult[total],  
       cnt = totalItem / 5,  
       returnRange = {(n-1) * Number.RoundDown(cnt)..n * Number.RoundUp(cnt)}    
     in  
       returnRange  


Code listed above provides us with a boundary of the issue numbers we need to call. Now we just need to create 5 datasets and call the function separately, but keep in mind, we need to provide different n value. e.g. to get the first batch we can have the query like below

 let  
   rng = ufnGetRange(1),  
   startAt = List.Min(rng),  
   boundary = List.Max(rng),  
   pages = {Number.RoundDown(startAt/100)..Number.RoundDown(boundary/100)},  
   pageRange = List.Transform(pages, each _ * 100),  
   paginated = List.Transform(pageRange, each ufnGetJson(_)),  
   #"Converted to Table" = Table.FromList(paginated, Splitter.SplitByNothing(), null, null, ExtraValues.Error)  
 in  
   #"Converted to Table"  

The last step is simply creating a new dataset by combining all 5 datasets into one.


 let  
   combined = Table.Combine({#"Issue Batch 1", #"Issue Batch 2", #"Issue Batch 3", #"Issue Batch 4", #"Issue Batch 5"}),  
      ....  
      ....  
      ....  




At my end this approach improves loading speed a lot, but obviously it depends on connection capacity. Let me know if there is any problem :)