Extracting values from Kusto JSON columns in PBI
What is the problem
In Kusto (aka Azure Data Explorer aka ADX) you can have columns in a table that contain JSON structures.
In KQL it is very easy to extract elements from these columns and use them as regular columns.
It requires more resources but overall, it is standard.
An example can be found in the table TransformedMetrics in the SampleMetrics databases in the help cluster.
Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('SampleMetrics')]
TransformedMetrics
| project tags
| take 1
Table0
tags |
{"host":"telegraf-568fbb676-rqmq6","measurement_db_type":"SQLServer","sql_instance":"sqlserver7","wait_category":"Other Disk IO","wait_type":"ASYNC_IO_COMPLETION"} |
The tags columns contains inside it host, measurement_db_type ,sql_instance,wait_type and more
If we want to use the wait_time column in a query we can use:
Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('SampleMetrics')]
TransformedMetrics
| extend Wait_Type=tostring(tags.wait_type)
| summarize Cn=count() by Wait_Type
| where Cn> 10000 and Wait_Type !=''
| order by Cn
Table0
Wait_Type |
Cn |
PREEMPTIVE_OS_DEVICEOPS |
177,441 |
ASYNC_NETWORK_IO |
14,276 |
PAGELATCH_SH |
11,315 |
PAGELATCH_EX |
11,086 |
Our task is to do a similar analysis in Power BI.
One way to do that is described by Chris Web in his blog
I want to present a different approach which IMHO is simpler
Using Value.NativeQuery to extract from JSON
This M function allows us to add a snippet of native code, in our case KQL, anywhere in the query generated by the connector based on the transformation you apply to the Kusto table.
KQL is a perfect match for this approach because a query in KQL is a string of operations each taking from the previous table and creating a new table.
This is exactly the way an M query is built and so by adding a step in M, we can add a step in the generated KQL query at any position.
There are many use cases for this pattern. Here I’ll describe one use case for extracting from a JSON column
In the KQL shown above, there is one operation we want to insert into the KQL generated by PBI but currently we don’t have a builtin way to do that.
Creating the basic M query
I’ll assume here that you already know how to connect to Kusto from PBI and select a table from the list of tables in a database.
Don’t enter any query in the connection dialog, just specify the cluster “help” and specify if you want import or direct query.
Navigate to the SampleMetrics database and select the table TransformedMetrics
Click transform data and in the Power Query editor you’ll see
The tags column is shown as a record but attempting to see the content of the record will not work.
Adding a step
Click the Fx icon from the formula bar and a new step will be added with the formula
= TransformedMetrics1 which is just using the previous step.
Change the step from the formula bar (If you don’t see the formula bar check view/Formula Bar)
= Value.NativeQuery(TransformedMetrics1,"| extend Wait_Type=tostring(tags.wait_type)")
The function uses the previous step and in the second argument we use the exact KQL step we used in the explorer. If we need to extract more columns, we can do that in the same step
After applying the new step, the query editor will show a new column named Wait_Type
Now we can continue adding steps to aggregate and filter it and eventually we can get the same result as in the original KQL query.
The final query from view native query in the PQ editor is
["TransformedMetrics"]
| extend Wait_Type=tostring(tags.wait_type)
| where strcmp(["Wait_Type"], ("")) != 0
| summarize ["Count"]=count() by ["Wait_Type"]
| where (["Count"]) > ((10000))
Notice that there are two where steps, one to filter out the empty wait_type and one to after the summarize to select types with count greater than a value.
In the attached example I use a dynamic M parameter to specify the cutoff value.
Other uses of Value.NativeQuery
The same mechanism can be used to use summarize options like percentiles, special where options like regex matching and many other KQL features that do not have a match in M.
What about inserting the entire query into the M script?
This is also a valid option, but it is more difficult to maintain.
My recommendation is in any case that you have non-trivial KQL logic, put it in a function with or without parameters, and call the function from PBI .
In cases that the Kusto connector can generate most of the KQL for you, or you are not fluid with writing KQL just use PQ steps.
If one of the features you need is not available, the Value.NativeQuery function can help in injecting the minimum amount of KQL syntax in the right place.
In any case look at the generated KQL (View native query from the last step in the M query).
You can think about it as a KQL learning tool.
Posted at https://sl.advdat.com/3JwvLeihttps://sl.advdat.com/3JwvLei