2019-12-19

Retrieve SharePoint Online List by Graph API

Last day in 2019, and last post in 2019.

Actually pulling SharePoint List from Power Query/Power BI is much faster and straightforward. But in some cases, you may still need to follow the traditional ETL way. Hope below script can help you if you need it.

Firstly we need to register the app in Azure AD (Check Here). After create your app, write down 3 ids: tenant id, client id, and the secret. Note the secret will only be exposed to you when you create the app, then it will be masked (though you can always create a new one.)


To allow your app to access the SharePoint list, you need to enable the "Site.Read.All" permission under the Application Permission category, as demonstrated below. Keep in mind though, the "Site.Read.All" permission needs AAD admin's consent.




That is all about the app setup. To query the list, we need to exchange the token by our tenant id, client id, and the secret:

 public class RespToken
        {
            public string token_type { get; set; }
            public int expires_in { get; set; }
            public int ext_expires_in { get; set; }
            public string access_token { get; set; }
        }

        public static string GetAccessToken(string tenentId, string clientId, string clientSecret)
        {
            string token = string.Empty;

            string uriToken = string.Format("https://login.microsoftonline.com/{0}/oauth2/v2.0/token", tenentId);
            string contentType = "application/x-www-form-urlencoded";
            string grantType = "client_credentials";
            string scope = "https://graph.microsoft.com/.default";

            string tmpBody = "grant_type={0}&client_id={1}&client_secret={2}&scope={3}";
            string body = string.Format(tmpBody, grantType, clientId, clientSecret, scope);
            byte[] contents = Encoding.UTF8.GetBytes(body);

            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uriToken);
            request.ContentType = contentType;
            request.Method = "POST";

            request.ContentLength = contents.Length;
            using (Stream writeStream = request.GetRequestStream())
            {
                writeStream.Write(contents, 0, contents.Length);
            }

            using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
            {
                using (Stream readStream = response.GetResponseStream())
                {
                    string responseBody = string.Empty;
                    using (StreamReader sr = new StreamReader(readStream))
                    {
                        responseBody = sr.ReadToEnd();
                    }
                    JavaScriptSerializer Serializer = new JavaScriptSerializer();
                    var r = Serializer.Deserialize(responseBody);
                    token = r.access_token;
                }
            }
            
            return token;
        }




Then we follow the routine "search site => search list => get list item":

static void Main(string[] args)
        {

            string tenentId = string.Empty;
            string clientId = string.Empty;
            string clientSecret = string.Empty;
            string siteName = string.Empty;

            string siteId = string.Empty;
            string listId = string.Empty;


            Console.WriteLine("Tenent Id: ");
            tenentId = Console.ReadLine();

            Console.WriteLine("Client Id: ");
            clientId = Console.ReadLine();

            Console.WriteLine("Client Secret: ");
            clientSecret = Console.ReadLine();

            if (string.IsNullOrEmpty(tenentId) || string.IsNullOrEmpty(clientId) || string.IsNullOrEmpty(clientSecret))
            {
                Console.WriteLine("empty value found in mandatory parameters");
                return;
            }

            string token = GetAccessToken(tenentId, clientId, clientSecret);
            Console.WriteLine("Token acquired, query site");
            

            Console.WriteLine("Input site name for search: ");
            siteName = Console.ReadLine();

            string uriSite = $"https://graph.microsoft.com/v1.0/sites?search={siteName}";

            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uriSite);
            request.Method = "GET";
            request.Headers.Add(HttpRequestHeader.Authorization, token);

            using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
            {
                using (Stream s = response.GetResponseStream())
                {
                    using (StreamReader r = new StreamReader(s))
                    {
                        string contents = r.ReadToEnd();
                        var j = JObject.Parse(contents);
                        siteId = j["value"]
                            .FirstOrDefault(x => x["name"].ToString().ToLower() == "msgraph")["id"]
                            .ToString();

                    }
                }
            }

            if (string.IsNullOrEmpty(siteId))
            {
                Console.WriteLine("error happens, cannot obtain site id");
                return;
            }


            Console.WriteLine($"site id acquired, {siteId}");
            string uriList = $"https://graph.microsoft.com/v1.0/sites/{siteId}/lists";
            request = (HttpWebRequest)WebRequest.Create(uriList);
            request.Method = "GET";
            request.Headers.Add(HttpRequestHeader.Authorization, token);

            using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
            {
                using (Stream s = response.GetResponseStream())
                {
                    using (StreamReader sr = new StreamReader(s))
                    {
                        string content = sr.ReadToEnd();
                        var jObject = JObject.Parse(content);

                        listId = jObject["value"]
                            .FirstOrDefault(x => x["name"].ToString().ToLower() == "events")["id"]
                            .ToString();
                    }
                }
            }

            if (string.IsNullOrEmpty(listId))
            {
                Console.WriteLine("error occurred, cannot obtains list id");
                return;
            }

            Console.WriteLine($"list id: {listId}");

            Console.ReadLine();

        } 



2019-12-16

Power Query for Twitter API

I am lazy, so let's make it quick. Looping twitter could be a bit trouble so I put power query script (for searching a specific account by timeline API) below. Hope it helps

1. A function to get token

let
    fnGetToken = () =>
        let 
            key = "Basic " & Binary.ToText(Text.ToBinary("{you should know what it is}"),0),
            url = "https://api.twitter.com/oauth2/token",
            getJson = Web.Contents(url,
                [
                    Headers = [#"Authorization"=key,#"Content-Type"="application/x-www-form-urlencoded;charset=UTF-8"],
                    Content = Text.ToBinary("grant_type=client_credentials") 
                ]
            ),
            j = Json.Document(getJson),
            token = j[access_token],
            tokenHeader = "bearer " & token
        in tokenHeader
in
    fnGetToken


2. Another function to get the max twitter id regarding the topic

let
 fnGetMaxId = (token) =>
 let 
  GetJsonQuery = Web.Contents("https://api.twitter.com/1.1/statuses/user_timeline.json?screen_name={this is the account}&count=1",
   [
    Headers = [#"Authorization"=token]
   ]
  ),
  FormatAsJsonQuery = Json.Document(GetJsonQuery),
  t = Table.FromList(FormatAsJsonQuery, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
  maxId = Table.ExpandRecordColumn(t, "Column1", {"id_str"}, {"maxid"})
 in 
  maxId
in
 fnGetMaxId


3. the 3rd function to get result by token and endpoint. (technically you don't need it, just to make the main function easy to read)

let
 fnGetList = (token, endpoint) =>
  let 
   req = Web.Contents(endpoint,
    [
     Headers = [#"Authorization"=token]
    ]
   ),
   json = Json.Document(req)
  in 
   json
in
 fnGetList


4. finally the main query (note suggest to have a initial Id to limit the size of the result)

let
 fn = (initId) =>
  let
   token = fnGetTokenHeader(),
   I = if initId = "" then fnGetMaxId(token){0}[maxid] else initId,
   endPoint = "https://api.twitter.com/1.1/statuses/user_timeline.json?screen_name={this is the account}&max_id=" & I & "&count=200",
   records = fnGetResults(token, endPoint),
   tbl = Table.FromList(records, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
   tblMaxIds = Table.ExpandRecordColumn(tbl, "Column1", {"id_str"}, {"maxid"}),
   minId = try List.Min(tblMaxIds[maxid]) otherwise null,
   recordSet = [records = records, nextId = minId]
  in
   recordSet,
 resultSet = List.Generate(
                    () => fn(""),
                    each Number.FromText(_[nextId]) >= 1000000000000000000, //an max id you want to put to limit result size, you should get a value by calling getMaxId function, otherwise it could return too many lists
     // each _[nextId] <> null,
                    each fn(_[nextId]),
                    each [nextId = _[nextId], records = _[records]]
                    )
in
    resultSet



2019-03-19

Handling Long File Name in Windows

Let's make the question short: on Windows platform, how can you copy the file if the path is too long?

This question is coming from a project I worked on recently. What happened is that quite a few Access Databases are dumped to a network shared folder, and we need to move them to a staging folder for data process. Because the access database is dumped automatically by a 3rd party software, some access databases were saved in an invalid file path (more than 260 characters in file's full name).

The solution is simple: RoboCopy. It is a standard component in Windows now and you can find an explanation of it here. So basically what we need to do, is listing all files from the source folder by time stamp, and then calling the RoboCopy to copy the file. Procedure is simple but just a few things need to have a look:

1. The FileInfo class acts differently by Windows platform
The SSIS package was built on Windows 7 platform (yes, 2019 already I know, but you won't believe how old client's infrastructure is.). On Windows 7 when calling FileInfo's full file name, there will be an IO.PathTooLongException. As demonstrated below, in the try catch block we extract file's name and path if the file's full name is too long, and then construct file's full name and pass it to RoboCopy.



So far so good, but the SSIS package eventually would be triggered on Windows 2012 platform. When we tested the package, the FileInfo's FullName actually returned the full path of the file, and the exception occurred at the directory level, which left us an unhandled exception: So files supposed to raise IO.PathTooLongException were passed to SSIS's File System Task, and obviously, SSIS's File System Task cannot handle File Name Too Long error.





Fixing the error is easy, but it is a good lesson: keeping dev and prod environment identical is critical.

2. RoboCopy can have more than one return code to represent success result.
When calling SSIS's Execute Process Task, only one return code is accepted by default, e.g. 0 by default is treated as success value, return code other than 0 will be treated as failure.

But when you look at RoboCopy's document, you can see RoboCopy's return code is a bitmap. In my situation, we need to set more than one return codes for success value.

We can easily fix this problem by following steps below:

In the Execute Process Task Configuration windows, change FailTaskIfReturnCodeIsNotSuccessValue to False. By setting it to false, Execute Process Task will always report success.


Now keep Execute Process Task selected and go to the Properties window (locate at the bottom-right corner by default, press Ctrl + W then P if you cannot find it), assign a package variable to ExecValueVariable. It will capture the execute result to the variable.



Now you can figure it out, instead of letting Execute Process Task report success or failure, we capture execution result to a variable, then in the Precedent Constraint we can use expression to identify the actual outcome of the RoboCopy.

That is today's post, good experience to find and fix these issues though at the beginning the task was looked so simple.













2019-01-22

SSAS Tabular Case Study: Change Cases of Member

Happy New Year, and I hope everyone all the best to 2019.

To begin with 2019, I just want to keep it simple. Let's see a small problem I found in SSAS tabular model recently: It could be very confusing when we need to change the case of a dimension member. To demonstrate this problem, I created a very simple table:

use master
go

create database demo
go

use demo
go

if exists (select 1 from sys.tables where name = 'product')
drop table product
go

create table PRODUCT
(
ProductKey int identity(1,1),
ProductName varchar(100),
Country varchar(100),
Constraint PK_Product Primary Key (ProductKey)
)
go

Insert into Product(ProductName, Country)
values
('BIKE', 'AU'),
('BIKE', 'UK'),
('BIKE', 'US'),
('BIKE', 'CN'),
('BIKE', 'NZ')
go

Select * from PRODUCT

As you can see it is a very simple table

And then I just created a tabular model based on this simple table.


Now I bring in the Power BI to visualise the product name by country

So far everything is good. But assume CN's manager wants to make some formatting changes: I want to see the product name in proper case, "Bike". Looks very simple, doesn't it? So we go to the database, update product name to proper case, re-process the model.

Update PRODUCT set ProductName = 'Bike' where Country = 'CN' 


Now refresh the report, nothing happens!


What happened? You can find the reason here. In short, it is because tabular model uses columnar storage mode, so for the same string value, it will only save the first instance regardless of its form. So in tabular model, "BIKE", "bike", "Bike", and "biKE" are all saved as one entry "BIKE". So now doesn't mater how many times you update your data, or re-process your tabular model, the report will always show "BIKE" as the product name.

To approve this, let's update all "BIKE" to "Bike" now, then re-process the model.


Now we go back to the PBI report and refresh it.



So here is the conclusion. When we design the data model, usually we will denormalise the data to build the dimension, which obviously, will generate lots of duplicated entries. When there is a requirement to show members in different forms, we need to be very careful due to the storage mode. In my practice, I would suggest to have a format string field build into the model.