Sunday, February 20, 2022

Extracting values from Kusto JSON columns in Power BI

Extracting values from Kusto JSON columns in PBI

 

What is the problem

 

In Kusto (aka Azure Data Explorer aka ADX) you can have columns in a table that contain JSON structures.

In KQL it is very easy to extract elements from these columns and use them as regular columns.

It requires more resources but overall, it is standard.

An example can be found in the table TransformedMetrics in the SampleMetrics databases in the help cluster.

Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('SampleMetrics')]

TransformedMetrics

| project tags

| take 1



Table0

tags

{"host":"telegraf-568fbb676-rqmq6","measurement_db_type":"SQLServer","sql_instance":"sqlserver7","wait_category":"Other Disk IO","wait_type":"ASYNC_IO_COMPLETION"}

 

 

The tags columns contains inside it host, measurement_db_type ,sql_instance,wait_type and more

 

 

If we want to use the wait_time column in a query we can use:

Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('SampleMetrics')]

TransformedMetrics

| extend Wait_Type=tostring(tags.wait_type)

| summarize Cn=count() by Wait_Type

| where Cn> 10000 and Wait_Type !=''

| order by Cn

 



Table0

Wait_Type

Cn

PREEMPTIVE_OS_DEVICEOPS

177,441

ASYNC_NETWORK_IO

14,276

PAGELATCH_SH

11,315

PAGELATCH_EX

11,086

 

 

Our task is to do a similar analysis in Power BI.

One way to do that is described by Chris Web in his blog

I want to present a different approach which IMHO is simpler

 

Using Value.NativeQuery to extract from JSON

 

This M function allows us to add a snippet of native code, in our case KQL, anywhere in the query generated by the connector based on the transformation you apply to the Kusto table.

KQL is a perfect match for this approach because a query in KQL is a string of operations each taking from the previous table and creating a new table.

This is exactly the way an M query is built and so by adding a step in M, we can add a step in the generated KQL query at any position.

There are many use cases for this pattern. Here I’ll describe one use case for extracting from a JSON column

In the KQL shown above, there is one operation we want to insert into the KQL generated by PBI but currently we don’t have a builtin way to do that.

Creating the basic M query

 

I’ll assume here that you already know how to connect to Kusto from PBI and select a table from the list of tables in a database.

Don’t enter any query in the connection dialog, just specify the cluster “help” and specify if you want import or direct query.

Navigate to the SampleMetrics database and select the table TransformedMetrics

Click transform data and in the Power Query editor you’ll see

DanyHoter_0-1645346686898.png

 

The tags column is shown as a record but attempting to see the content of the record will not work.

Adding a step

 

Click the Fx icon from the formula bar and a new step will be added with the formula

= TransformedMetrics1 which is just using the previous step.

Change the step from the formula bar (If you don’t see the formula bar check view/Formula Bar)

= Value.NativeQuery(TransformedMetrics1,"| extend Wait_Type=tostring(tags.wait_type)")

The function uses the previous step and in the second argument we use the exact KQL step we used in the explorer. If we need to extract more columns, we can do that in the same step

After applying the new step, the query editor will show a new column named Wait_Type

DanyHoter_1-1645346686906.png

 

 

 

 

Now we can continue adding steps to aggregate and  filter it and eventually we can get the same result as in the original KQL query.

The final query from view native query in the PQ editor is

 

["TransformedMetrics"]

| extend Wait_Type=tostring(tags.wait_type)

| where strcmp(["Wait_Type"], ("")) != 0

| summarize ["Count"]=count() by ["Wait_Type"]

| where (["Count"]) > ((10000))

 

Notice that there are two where steps, one to filter out the empty wait_type and one to after the summarize to select  types with count greater than a value.

In the attached example I use a dynamic  M parameter to specify the cutoff value.

Other uses of Value.NativeQuery

 

The same mechanism can be used to use summarize options like percentiles, special where options like regex matching and many other KQL features that do not have a match in M.

What about inserting the entire query into the M script?

 

This is also a valid option, but it is more difficult to maintain.

My recommendation is in any case that you have non-trivial KQL logic, put it in a function with or without parameters, and call the function from PBI .

In cases that the Kusto connector can generate most of the KQL for you, or you are not fluid with writing KQL just use PQ steps.

If one of the features you need is not available, the Value.NativeQuery function can help in injecting the minimum amount of KQL syntax in the right place.

In any case look at the generated KQL (View native query from the last step in the M query).

You can think about it as a KQL learning tool.

 

 

 

 

Posted at https://sl.advdat.com/3JwvLeihttps://sl.advdat.com/3JwvLei