Wednesday, December 8, 2021

Direct Query from Excel to Azure Data Explorer (aka Kusto)

Direct Query from Excel to Azure Data Explorer (aka Kusto)

 

Summary

 

If you are familiar with Power BI, you know that you can connect to data sources in two modes – Import and Direct Query.

Power Query in Excel doesn’t offer this choice the only option is import.

You can import to a table or a pivot table and of course you can import to the Excel data model.

In this article I’ll show how can Kusto data be queried in a similar way to DQ with Dynamic M parameters in PBI

Different methods can be used in Excel to select parameter values and feed them into queries that will be send to the Kusto backend.

Summary results will be returned and can be analyzed further with tools like pivot tables.

All this can be achieved without using VBA or any other extensibility methods. Only Power Query and Excel formulas.

 

Target report

 

I am going to use the table StormEvents which can be found in the help cluster.

The data is free to use and to learn how to query data using the Kusto Query Language (KQL).

 The StormEvents contains information about weather events that occurred in the US in 2007.

It is not a big table, and we could easily bring all the data into Excel and continue from there.

However, I will bring summary data from the table in the same way I could do it in a table with many billions of rows.

A real-life table will probably be ingesting new data all the time so querying it in real time will be essential for using the latest data.

The end report will be a pivot table. The report will initially show the selected Event Types on columns, the States on rows and the number of events in values.

You can explore other configurations of the pivot once the data is imported.

You can also add the data to the Excel data model and add more data from other sources.

After we select our parameter values, we can click on refresh all and the pivot will be refreshed based on the values selected in Excel

 

Building the report

 

Bringing the list of Event Types

 

Our first query will bring the list of Event Types from the table.

From the data ribbon, click on get-data and select Azure Data Explorer as source

DanyHoter_17-1637129755577.png

 

 

You need to enter the address of the cluster and click OK

 

DanyHoter_18-1637129755580.png

 

 

 

Now you see the contents of the cluster, select StormEvents from the Samples database

DanyHoter_19-1637129755582.png

 

 

 

Choose Transform Data to open the Power Query editor

 

Remove all columns except the EventType column

DanyHoter_20-1637129755584.png

 

 

 

Now remove duplicate rows so we’ll have a list of all unique Event Types

 

DanyHoter_21-1637129755586.png

 

If you want to see the KQL query that was generated so far you can right click the last step and use View Native Query

DanyHoter_22-1637129755587.png

 

 

 

This is the query

DanyHoter_23-1637129755588.png

 

Rename the query to Event Types

Now select close and load to and select the option to bring the data into a PivotTable

DanyHoter_24-1637129755590.png

 

 

 

Drag the EventType column to rows

Select a subset of the Event Types using search or clicking individual types.

You can also insert a slicer to select types in a visual way.

Create the query that builds the list of selected Event Types

 

Select the sheet column which contains the pivot and give it the name TypeList.

We need a named range to build a query.

Once in the editor we need to create a list of all selected values

We filter out the unnecessary values like null, Row Labels and Grand Total

DanyHoter_25-1637129755592.png

 

 

 

We need to convert the table to a list.

From the column header use right click/Drill Down. This step converts the table into a list that can be used to filter.

Now use “close and load to” and select the option: only create connection.

Pulling data based on the selected Event Types

 

From Excel use Recent Sources from the get & transform area in the Data ribbon

Select the connection just created to the Help cluster.

Select the StormEvents table again

Click transform data and we are back in the PQ editor

Now we need to filter the data based on the list of selected event types and aggregate the data.

The data in a real case is probably way too large to bring into Excel so we need the backend to both filter and aggregate the data for us.

First add a step to filter the EventType column by any value

The step will show in this way

DanyHoter_26-1637129755593.png

 

You need to edit this step in the formula bar to use the list for filtering

DanyHoter_27-1637129755593.png

 

Pay attention to the right number of parentheses.

Select the StartTime and convert it to a date with the first day of the month.

DanyHoter_28-1637129755596.png

 

Rename the column to Month and change its type to date.

Now it is time to aggregate the data using Transform/Group By

Here is a suggestion for the Group By settings, you can select more or less columns to aggregate

DanyHoter_29-1637129755597.png

 

Now it is a good time to look at the KQL syntax generated based on the applied step.

Note that if the Show Native Query option is disabled, it means that one of the steps we used could not be translated (aka folded) to KQL. This would be a problem because we want the backend to do all the heavy load.

If you follow the steps so far, the query should be fully folded and ready to be send to Kusto.

Use close and load to and choose the same options as in the first query to create a new pivot next to the existing one with the list of types.

Once you see the pivot table, you can decide what columns to use on rows, columns and values

You can filter further the data to see certain States, Event Types etc.

Notice that this filtering is done by the pivot itself based on the data brought in from Kusto

If you want to bring data about a different set of event types, you should select other types from the pivot on the left and click data/refresh all.

A new query will be generated, send to Kusto, and populate the pivot table.

Filtering the data with a value form a single cell

 

Let’s assume that we are interested only in events that cause serious damage, as measured by the DamageProperty aggregate value.

We can do it using the pivot filtering capabilities, but what if we want to apply the filter as part of the query so the filtering takes place in the backend and only relevant data is brought to Excel.

Make some room in the sheet with the two pivot tables and allocate a cell for the parameter that will contain the minimum damage value to make the event interesting for us.

Name the cell DamageLimit and create a new query from sheet.

From the single column header use Drill Down to convert the single value table into a single value list.

Enter again the query that brings the aggregated data

After the group by step, add a filter step on the DamageProperty column.

At first, use a filter on a constant value

DanyHoter_30-1637129755598.png

 

Now edit the formula

DanyHoter_31-1637129755599.png

 

Now the filter uses the value that came from the Excel cell.

Close and load to and choose “only create connection”. This setting refers to the newly created query from the cell.

Now you can choose any numeric value for the damage limit and click data/Refresh All

 

Using ADX functions instead of starting from a simple table

 

Functions are KQL statements that can accept parameters, execute complex logic and return a table.

It is a good idea to create functions in cases where the same logic needs to be reused in multiple reports or where KQL features that cannot be expressed with Power Query are required.

Functions can be created and tested in Kusto explorer and later they appear in the navigator together with regular tables.

Let’s do an example with the same logic we implemented so far but this time using a function.

I used the KQL generated in Powe Query and turn it into a function.

If you want to follow, you need access to a database in which you have permissions to create new functions. You can’t do it in the Samples database in the Help cluster.

I use a syntax to refer to the StormEvents table on a different database and cluster than the one where the function is stored.

This is the syntax to create the function

 

.create-or-alter function StormStats(Types:dynamic,DamageLimit:long)

{

  cluster("https://help.kusto.windows.net").database("Samples").["StormEvents"]

| where ["EventType"] in(Types)

| project ["StartTime"],["State"],["EventType"],["DamageProperty"],["DamageCrops"]

| extend ["Month"]=startofmonth(["StartTime"])

| project ["State"],["EventType"],["DamageProperty"],["DamageCrops"],["Month"]

| summarize ["Events"]=count(), ["DamageP"]=sum(["DamageProperty"]), ["DamageC"]=sum(["DamageCrops"]) by ["EventType"], ["State"], ["Month"]

| extend ["Damage"]=(["DamageP"]) + (["DamageC"])

| project ["EventType"],["State"],["Month"],["Events"],["Damage"]

| where (["Damage"]) >= DamageLimit

}

 

The two parameters correspond to the values we take from the Excel grid.

When we connect to the cluster we’ll see in the navigator

DanyHoter_32-1637129755601.png

 

Tables and functions. We select the new function StormStats , in the preview pane we can enter some values for the parameters, see the preview and click Transform Data.

 

 

DanyHoter_33-1637129755604.png

 

Once in the editor we need to add a step to the TypeList query we created before 

In order for the list to be able to be used as a dynamic value in the function, we need to insert a new step. From the last step in applied steps right click/Insert Step After.

The new step will show as =Column1 and you need to change it to

= Text.Combine({"['", Text.Combine(Column1, "','") , "']"})

This transformation will convert the list into a text like

['Cold/Wind Chill','Extreme Cold/Wind Chill','High Wind','Marine High Wind','Marine Strong Wind','Marine Thunderstorm Wind','Strong Wind','Thunderstorm Wind']

Which will work well as an argument to the function.

The rest is the same as before, the query can return a pivot table and be used instead of the existing pivot.

The original query will not work with the new version of the TypeList query.

If you are not fluent in KQL, you can create the query using Power Query and once the query is ready, copy the generated KQL and encapsulate it in a function. The advantage will be maintenance of the query in the database and easier reuse of the logic for multiple reports

 

 

 

Posted at https://sl.advdat.com/3EwkZ5A