Thursday, July 29, 2021

Use Custom DAX to create a Linear Regression Trendline with a Dynamic Date Hierarchy in Power BI

Custom Linear Regression DAX expressions give you insights into all components of the Y = MX + B equation.Custom Linear Regression DAX expressions give you insights into all components of the Y = MX + B equation.

 

Recently I was asked to provide an example of a custom Linear Regression Trendline in Power BI that can dynamically re-calculate for both different levels of a Date hierarchy along with different filter selections. After designing and creating the example, I thought it would be useful to share with the Community. Linear trendlines can be useful for trendspotting, forecasting, and pattern identification in data. In this article (and related video tutorial) I'll cover the following:

  1. Add a Linear Regression Trendline to a Dataset and Report using custom DAX
  2. Visualize and interpret the DAX calculations
  3. Provide a link to a sample PBIX file with a copy of the solution
  4. (Video) Discuss how to extend the Linear Regression Trendline calculations to uncover trends within the data and visualize patterns at scale

The approach used in this example could be applied to any data on a date axis, for any industry. In Healthcare (this is a Healthcare Blog) some examples might include Average Length of Stay, Complications, Falls, Performance Metric Rates, Supply Chain Metrics, and more. These dynamic calculations can be used to simply visualize trends, or to bubble up specific sub-categories of data that are sharply trending up or down over time. For a metric such as Average Length of Stay, users could browse the Power BI report to compare trends over time by Department, DRG, Floor, Primary Diagnoses, Location, Day of Week, etc and find areas in need of attention. 

 

Before reading any further, if you are looking for a simple trendline on a chart then there is an out-of-the-box option that only takes a few clicks (click here). If you'd like to do advanced analytics, have access to the underlying calculations with a custom version, and compare different slices of the data interactively to bubble up opportunities then keep reading.

 

For the purpose of this example, a linear regression trendline will be calculated using hierarchical values on a Date axis. The aggregated values for each member of the Date axis will be used to calculate the equation of a linear regression trendline such that Y = MX + B:

  • Y is the y axis value of the trendline at each Date interval.
  • M is the slope of the trendline.
  • X is the order of the value on the x axis.
  • B is the point where the trendline starts on the y axis.

The Linear Regression Trendline will works for Weeks, Months, Quarters, and Years depending on the level you choose to view in a report. Adjusting the level of Date aggregation can be helpful when your data has both smaller sample sizes that need to be aggregated over broader periods of time, or robust sample sizes that need to be analyzed daily or weekly.

 

Prerequisites

  1. This demo can be built in either Power BI Desktop, Azure Analysis Services, SQL Server Analysis Services, or Excel PowerPivot Tabular Models.
  2. You'll need a good Date Table as part of your Model. I provide free scripts for Date tables at this link: https://github.com/gregbeaumont/PowerPopHealth/tree/main/Date%20%26%20Time%20Scripts 
  3. You'll also need a Fact table with calculations having a Date key. The example I've used for the demo can be replicated from this link (real CDC Population Weighted UV Irradiance data): https://github.com/gregbeaumont/PowerPopHealth/tree/main/CDC 
  4. For this demo, I added a new column to the Date Table for the [Week Index]. This column is only necessary if you want to have trendlines at the weekly level. The [Week Index] starts at 1 for the earliest week in your Date table, and increases each week by one over the course of all weeks in the Date table. Date aggregations other than Week (Month, Quarter, Year) support Time Intelligence DAX expressions and such an Index is not needed.

A link to download my full demo PBIX file for this article is herehttps://github.com/gregbeaumont/PowerPopHealth/tree/main/Demos 

 

A video detailing how the regression will work and can provide value is embedded after the DAX expressions, or you can view it here.

 

Linear Regression Trendline DAX Expressions

 

This example will build a linear regression trendline for a calculated value over the course of Years, Quarters, Months or Weeks that are plotted on the x axis. I used an Average of i380 ([i380 Avg] is a measurement of radiation taken daily for the CDC in every US County) for the calculated value. You can replace [i380 Avg] with your own calculation if you add your own fact table to the solution.

 

I chose the CDC data for the demo because it was robust, real, and I had it ready to go for another project. The purpose of this demo is not to extract insights from the sample data, but to prove out the method for calculating a linear regression using DAX. An astute analyst will probably notice that a nonlinear regression would work better with this CDC data since solar radiation levels are seasonal.

 

I want to add a few quick notes about the DAX measures in this solution. The code snippets below look complicated, but they consist of a few patterns that repeat. Also, the DAX measures perform well on the sample data containing 12.3 million rows of daily measurements for every County in the USA. There might be some additional opportunities to improve the query performance using variables and calculation groups, any suggestions are welcome in the comments.

 

Create each of these DAX Measures in the following order:

  1. Run Count (X) = //* A calculation to count the number of measurements that are included on the axis
    var MinDate = CALCULATE(MIN('CDC Pop Rated UV Irradiance'[Date]),ALLSELECTED('Date')) //* A variable to pre-calculate the earliest Date
    RETURN
    SWITCH(TRUE(),
    ISFILTERED('Date'[Week Year]),CALCULATE(DISTINCTCOUNT('Date'[Week Year]),FILTER(ALL('Date'[Date]),[Date] <= MAX('CDC Pop Rated UV Irradiance'[Date]) && [Date] >= MinDate)), //* Calculate the number of weeks leading up to and including the current week.
    ISFILTERED('Date'[Month Year]),CALCULATE(DISTINCTCOUNT('Date'[Month Year]),FILTER(ALL('Date'[Date]),[Date] <= MAX('CDC Pop Rated UV Irradiance'[Date]) && [Date] >= MinDate)), //* Calculate the number of months leading up to and including the current month.
    ISFILTERED('Date'[Quarter Year]),CALCULATE(DISTINCTCOUNT('Date'[Quarter Year]),FILTER(ALL('Date'[Date]),[Date] <= MAX('CDC Pop Rated UV Irradiance'[Date]) && [Date] >= MinDate)), //* Calculate the number of quarters leading up to and including the current quarter.
    ISFILTERED('Date'[Year]),CALCULATE(DISTINCTCOUNT('Date'[Year]),FILTER(ALL('Date'[Date]),[Date] <= MAX('CDC Pop Rated UV Irradiance'[Date]) && [Date] >= MinDate)), //* Calculate the number of years leading up to and including the current year.
    BLANK() //* Return a blank if other columns are used on the axis. )
  2. i380 Avg = AVERAGE([i380]) //* An average of the values in a column on the Fact table. This could also be a SUM or some other sort of calculated measure that you are putting on an axis.
  3. i380 Avg Prev = //* A calculation of the previous calculation on the axis. 
    var WeekIndexPrev = MAX('Date'[Week Index]) - 1 //* A variable for the Index of the previous week. Time Intelligence doesn't work at the week level, so this is only needed if you want to have weekly values on the axis.
    RETURN
    SWITCH(TRUE(),
    [Run Count (X)] < 1,BLANK(), //* Don't show any values if either 1) the current member on the axis is the first one, or 2) the current member is not on the axis of the visual.
    ISFILTERED('Date'[Week Year]),CALCULATE([i380 Avg],FILTER(ALL('Date'),'Date'[Week Index] = WeekIndexPrev)), //* Calculate the i380 Avg for the previous week.
    ISFILTERED('Date'[Month Year]),CALCULATE([i380 Avg],PREVIOUSMONTH('Date'[Date])), //* Calculate the i380 Avg for the previous month.
    ISFILTERED('Date'[Quarter Year]),CALCULATE([i380 Avg],PREVIOUSQUARTER('Date'[Date])), //* Calculate the i380 Avg for the previous quarter.
    ISFILTERED('Date'[Year]),CALCULATE([i380 Avg],PREVIOUSYEAR('Date'[Date])), //* Calculate the i380 Avg for the previous year.
    BLANK() //* Return a blank if other columns are used on the axis.
    )
  4. Rise = //* Calculate the delta (change) of the calculated measure between the current and previous Months on the axis.
    SWITCH(TRUE(),
    [Run Count (X)] <= 1, BLANK(), //* Don't show any values if either 1) the current member on the axis is the first one, or 2) the current member is not on the axis of the visual.
    [i380 Avg] - [i380 Avg Prev])
  5. Avg Rise (M) = //* Calculate the average rise for the entire series of data. 
    SWITCH(TRUE(),
    [Run Count (X)] < 1,BLANK(), //* Don't show any values if either 1) the current member on the axis is the first one, or 2) the current member is not on the axis of the visual.
    ISFILTERED('Date'[Week Year]),CALCULATE(AVERAGEX(SUMMARIZE('Date','Date'[Week Year]),[Rise]),ALLSELECTED('Date')), //* Calculate the average rise at the week level for all dates that have been selected.
    ISFILTERED('Date'[Month Year]),CALCULATE(AVERAGEX(SUMMARIZE('Date','Date'[Month Year]),[Rise]),ALLSELECTED('Date')), //* Calculate the average rise at the month level for all dates that have been selected.
    ISFILTERED('Date'[Quarter Year]),CALCULATE(AVERAGEX(SUMMARIZE('Date','Date'[Quarter Year]),[Rise]),ALLSELECTED('Date')), //* Calculate the average rise at the quarter level for all dates that have been selected.
    ISFILTERED('Date'[Year]),CALCULATE(AVERAGEX(SUMMARIZE('Date','Date'[Year]),[Rise]),ALLSELECTED('Date')), //* Calculate the average rise at the year level for all years that have been selected.
    BLANK() //* Return a blank if other columns are used on the axis.
    )
  6. Y Intercept (B) = //* Calculating the Y intercept, which is the value of the regression line when it crosses zero on the x axis.
    SWITCH(TRUE(),
    [Run Count (X)] < 1,BLANK(), //* Don't show any values if either 1) the current member on the axis is the first one, or 2) the current member is not on the axis of the visual.
    ISFILTERED('Date'[Week Year]),CALCULATE(AVERAGEX(SUMMARIZE('Date','Date'[Week Year]),([i380 Avg] - ([Avg Rise (M)] * [Run Count (X)]))),ALLSELECTED('Date')), //* Calculate the Y intercept at the week level (the B in Y = MX + B) by 1) calculating the (Rise * Run) and then subtracting it from the i380 average and 2) averaging all of those values.
    ISFILTERED('Date'[Month Year]),CALCULATE(AVERAGEX(SUMMARIZE('Date','Date'[Month Year]),([i380 Avg] - ([Avg Rise (M)] * [Run Count (X)]))),ALLSELECTED('Date')), //* Calculate the Y intercept at the month level (the B in Y = MX + B) by 1) calculating the (Rise * Run) and then subtracting it from the i380 average and 2) averaging all of those values.
    ISFILTERED('Date'[Quarter Year]),CALCULATE(AVERAGEX(SUMMARIZE('Date','Date'[Quarter Year]),([i380 Avg] - ([Avg Rise (M)] * [Run Count (X)]))),ALLSELECTED('Date')), //* Calculate the Y intercept at the quarter level (the B in Y = MX + B) by 1) calculating the (Rise * Run) and then subtracting it from the i380 average and 2) averaging all of those values.
    ISFILTERED('Date'[Year]),CALCULATE(AVERAGEX(SUMMARIZE('Date','Date'[Year]),([i380 Avg] - ([Avg Rise (M)] * [Run Count (X)]))),ALLSELECTED('Date')), //* Calculate the Y intercept at the year level (the B in Y = MX + B) by 1) calculating the (Rise * Run) and then subtracting it from the i380 average and 2) averaging all of those values.
    BLANK() //* Return a blank if other columns are used on the axis.
    )
  7. Regression (Y) = ([Avg Rise (M)] * [Run Count (X)]) + [Y Intercept (B)] //* Complete the (Y = MX + B) equation for your regression! 

 

Here is a diagram detailing how the above seven DAX expressions interact with each other as you stack them together like building blocks: 

Custom DAX Calculations are used as components of other DAX calculations to complete the Equation and make it dynamic for multidimensional queries.Custom DAX Calculations are used as components of other DAX calculations to complete the Equation and make it dynamic for multidimensional queries.

 

Once you've added these custom DAX expressions to your solution you can not only plot a Regression line at multiple levels of a Date Hierarchy, but also bubble up areas of opportunity in your data. In the example below, Counties can be bubbles up that have the most dramatic increase in [Avg Rise (M)] which means that they have the greatest upward trend over time.

Linear Regression Line Slope (M) can be used to Bubble Up Steeply Increasing or Decreasing slices of data.Linear Regression Line Slope (M) can be used to Bubble Up Steeply Increasing or Decreasing slices of data.

 In Healthcare you could bubble up the trends of measures such as Length of Stay, Complication Rates, Fall Rates, Compliance Rates, Discharge by Noon Rates to answer questions such as:

  • Which Specialties are showing the greatest upwards or downwards trends?
  • What Floors of the building are trending?
  • Which DRGs are trending the wrong way? What if we drill into the data and compare the trends of the bad DRGs by Shift or Primary Diagnoses?

Here is a video describing the process and utility of a custom linear regression trendline using DAX in Power BI: 

 

Posted at https://sl.advdat.com/2TGuCwm