Extracting values from Kusto JSON columns in PBI

What is the problem

In Kusto (aka Azure Data Explorer aka ADX) you can have columns in a table that contain JSON structures.

In KQL it is very easy to extract elements from these columns and use them as regular columns.

It requires more resources but overall, it is standard.

An example can be found in the table TransformedMetrics in the SampleMetrics databases in the help cluster.

Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('SampleMetrics')]

TransformedMetrics

| project tags

| take 1

Table0

tags

{"host":"telegraf-568fbb676-rqmq6","measurement_db_type":"SQLServer","sql_instance":"sqlserver7","wait_category":"Other Disk IO","wait_type":"ASYNC_IO_COMPLETION"}

The tags columns contains inside it host, measurement_db_type ,sql_instance,wait_type and more

If we want to use the wait_time column in a query we can use:

Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('SampleMetrics')]

TransformedMetrics

| extend Wait_Type=tostring(tags.wait_type)

| summarize Cn=count() by Wait_Type

| where Cn> 10000 and Wait_Type !=''

| order by Cn

Table0

Wait_Type	Cn
PREEMPTIVE_OS_DEVICEOPS	177,441
ASYNC_NETWORK_IO	14,276
PAGELATCH_SH	11,315
PAGELATCH_EX	11,086

Our task is to do a similar analysis in Power BI.

One way to do that is described by Chris Web in his blog

I want to present a different approach which IMHO is simpler

Using Value.NativeQuery to extract from JSON

This M function allows us to add a snippet of native code, in our case KQL, anywhere in the query generated by the connector based on the transformation you apply to the Kusto table.

KQL is a perfect match for this approach because a query in KQL is a string of operations each taking from the previous table and creating a new table.

This is exactly the way an M query is built and so by adding a step in M, we can add a step in the generated KQL query at any position.

There are many use cases for this pattern. Here I’ll describe one use case for extracting from a JSON column

In the KQL shown above, there is one operation we want to insert into the KQL generated by PBI but currently we don’t have a builtin way to do that.

Creating the basic M query

I’ll assume here that you already know how to connect to Kusto from PBI and select a table from the list of tables in a database.

Don’t enter any query in the connection dialog, just specify the cluster “help” and specify if you want import or direct query.

Navigate to the SampleMetrics database and select the table TransformedMetrics

Click transform data and in the Power Query editor you’ll see

The tags column is shown as a record but attempting to see the content of the record will not work.

Adding a step

Click the Fx icon from the formula bar and a new step will be added with the formula

= TransformedMetrics1 which is just using the previous step.

Change the step from the formula bar (If you don’t see the formula bar check view/Formula Bar)

= Value.NativeQuery(TransformedMetrics1,"| extend Wait_Type=tostring(tags.wait_type)")

The function uses the previous step and in the second argument we use the exact KQL step we used in the explorer. If we need to extract more columns, we can do that in the same step

After applying the new step, the query editor will show a new column named Wait_Type

Now we can continue adding steps to aggregate and filter it and eventually we can get the same result as in the original KQL query.

The final query from view native query in the PQ editor is

["TransformedMetrics"]

| extend Wait_Type=tostring(tags.wait_type)

| where strcmp(["Wait_Type"], ("")) != 0

| summarize ["Count"]=count() by ["Wait_Type"]

| where (["Count"]) > ((10000))

Notice that there are two where steps, one to filter out the empty wait_type and one to after the summarize to select types with count greater than a value.

In the attached example I use a dynamic M parameter to specify the cutoff value.

Other uses of Value.NativeQuery

The same mechanism can be used to use summarize options like percentiles, special where options like regex matching and many other KQL features that do not have a match in M.

What about inserting the entire query into the M script?

This is also a valid option, but it is more difficult to maintain.

My recommendation is in any case that you have non-trivial KQL logic, put it in a function with or without parameters, and call the function from PBI .

In cases that the Kusto connector can generate most of the KQL for you, or you are not fluid with writing KQL just use PQ steps.

If one of the features you need is not available, the Value.NativeQuery function can help in injecting the minimum amount of KQL syntax in the right place.

In any case look at the generated KQL (View native query from the last step in the M query).

You can think about it as a KQL learning tool.

Posted at https://sl.advdat.com/3JwvLeihttps://sl.advdat.com/3JwvLei

Advanced Data Solutions

Sunday, February 20, 2022

Extracting values from Kusto JSON columns in Power BI