Monday, May 9, 2022

Logic Apps Standard Performance Benchmark - Burst workloads

 

In collaboration with Henry Liu and Rohitha Hewawasam

 

Intro

We are starting a series of blog posts to benchmark common Logic App patterns.

For the first post is in the series, we will be looking at an asynchronous burst load. In this scenario an HTTP request with a batch of messages (up to 100k) invokes a process that fans out and processes the messages in parallel. It is based on a real-life scenario, implemented by one of our enterprise customers, who was kind enough to allow us to use it to benchmark various performance characteristics of Logic Apps Standard across WS1, WS2, and WS3 plans.

 

Workflows

 

The scenario is implemented using a single Logic App Standard App, which contains two workflows:

  1. Dispatcher – a stateful workflow using an HTTP trigger with SplitOn configured.  It receives an array of messages in the request body to split on, and for each message, it invokes a stateless child workflow (Enricher)
  2. Enricher – A stateless workflow used for data processing. It contains data composition, data manipulation with inline JavaScript, outbound HTTP calls to a Function App, and various control statements.

Having the parent workflow as a stateful facilitates scaling out to multiple instances and distributing parent workflow runs across them. Having the data processing workflow as stateless allows messages to be processed with lower latency while achieving higher throughput.

 

The diagram below represents the scenario at a conceptual level:

 

Asynchronous burst workload conceptual designAsynchronous burst workload conceptual design

 

You can see the details of each logic app implementation in the diagrams below:

 

Dispatche logic app workflowDispatche logic app workflow

 

Enricher Logic Apps WorkflowEnricher Logic Apps Workflow

 

 

Results

 

The diagrams below represent a series of metrics collated to understand the performance of each Logic Apps Standard app service Plan. You should be able to reproduce those results by following the instructions on github repository, collating your own benchmark measurements from Application Insights. Some notes:

 

  • Sending telemetry to App Insights incurs in a performance penalty. As we wanted to take the most out of the app service plans, the data collated below were gathered with App Insights disabled, using internal telemetry. So, your collated data might vary slightly. For more information on how to run queries and analyze your own data, take a look at the series of Kusto queries available in the read.me file of the github repository.
  • The Logic App Standard implementation has the default scale out settings enabled, so the data below reflects auto-scaling from the application.

 

Execution Elapsed Time

 

The chart below represents the total time take to process a batch of 40k and 100k messages, respectively, when using a WS1, WS2 and WS3 app service plan:

 

Execution Elapsed TimeExecution Elapsed Time

 

Scaling profile

 

The scaling profile shows how each app service plan scaled overtime to meet the workload burst, using the default settings for elastic scale.

 

Scaling Profile per App Service Plan - 40 K MessagesScaling Profile per App Service Plan - 40 K Messages

 

 

 

Scaling Profile per App Service Plan - 100 K MessagesScaling Profile per App Service Plan - 100 K Messages

 

 

 

The diagram below shows some interesting points:

 

  • In both cases, the scaling engine responds quite quickly to the increase in load, adding more machines to support the load.
  • The number of instances is proportional to the size of each instance and relatively linear between them. This is expected, as both CPU and memory grows proportionally between plans (WS3 is two time the compute of WS2, which is two times the compute of WS1).
  • Compared to Integration Services Environment, the scaling for Logic Apps Standard happens much faster, as there we are dealing with smaller instances that can be spun much faster. The same can be said to scaling back in.

The table below shows the peak and average instance counts for each plan, to meet the 40k and 100k workloads, respectively:

 

Instance

Type

40k Messages Workload

100k Messages Workload

# instances (avg)

# instances (peak)

# instances (avg)

# instances (peak)

WS1

13.3

20

18.6

27

WS2

8.5

10

12.4

18

WS3

5

7

7.5

13

 

Execution Throughput

 

When it comes to execution throughput, we can analyze it from two points of view:

  • Actions execution throughput 
  • Workflow execution throughput

We decided to focus on actions execution throughput, which might help you understand the throughput that would be applied to your own workflow, as it can be quite different in terms of complexity from the example presented. For completeness, we will include queries to calculate workflow executions later.

 

The table below represents the number of actions executed in total for each burst workload, and per workflow execution. This should allow you to analyze throughput and draw comparisons with your own scenarios and workloads.  

 

Workflow type

# Actions (40k messages)

# Actions (100k messages)

Total

Per Execution

Total 

Per Execution

Dispatcher (stateful)

80000

2

200000

2

Enricher (stateless)

1080000

27

2700000

27

 

The set of diagrams below action execution throughput, highlighting peak, average and progress overtime:

 

Actions Throughput (40k messages) - Average x  PeakActions Throughput (40k messages) - Average x Peak

Actions Throughput (40k messages) - timelineActions Throughput (40k messages) - timeline

 

 

Actions Throughput (100k messages) - Average x  PeakActions Throughput (100k messages) - Average x Peak

Actions Throughput (100k messages) - timelineActions Throughput (100k messages) - timeline

 

 

One more time, as app service plans grows, throughput increases proportionally, for both peak and averages.

 

Execution Delay (95th percentile)

 

Execution delay is the measure of when a job is scheduled to be executed versus when it was actually executed. It is important to note that this value will never be zero, but a higher execution delay indicates that system resources are busy and jobs have to wait longer before they can be executed.

 

Execution Delay - Stateful Workflows

Execution Delay (95th percentile) - Stateful Workflows - 40k messagesExecution Delay (95th percentile) - Stateful Workflows - 40k messages

 

 

 

Execution Delay (95th percentile) - Stateless Workflows - 100k messagesExecution Delay (95th percentile) - Stateless Workflows - 100k messages

 

 

 

 

Execution Delay - Stateless Workflows

 

Execution Delay (95th percentile) - Stateless Workflows - 40k messagesExecution Delay (95th percentile) - Stateless Workflows - 40k messages

 

 

 

 

 

Execution Delay (95th percentile) - Stateless Workflows - 100k messagesExecution Delay (95th percentile) - Stateless Workflows - 100k messages

 

Execution delay is the time difference between when an action is scheduled to be executed and when it actually executed. An execution delay of 200ms or less is optimal. Higher execution delay means that current system resources are at capacity and scaling out is needed.

Another point to notice in the graphics above is how much the execution delay is improved when using Stateless workflows, compared to Stateful workflows.

 

CPU and Memory Utilization

 

The diagrams below present the CPU and memory utilization for each instance added to support the workload, in each on of the App Service Plans. The default scaling configuration will try to keep CPU utilization between 60% and 90%, as under 60% would indicate that the instance is under utilized, and 90% would indicate that the instance is under stress, leading to performance degradation. As usually CPU becomes a bottleneck much faster than memory, there is no default scaling for memory utilization. This can be confirmed by the data below.

 

90th Percentile CPU Percentage Utilization - 40K Messages Workload

 

CPU Utilization Per Minute - 40K Messages - WS1 App Service PlanCPU Utilization Per Minute - 40K Messages - WS1 App Service Plan

 

 

 

CPU Utilization Per Minute - 40K Messages - WS2 App Service PlanCPU Utilization Per Minute - 40K Messages - WS2 App Service Plan

 

 

CPU Utilization Per Minute - 40K Messages - WS3 App Service PlanCPU Utilization Per Minute - 40K Messages - WS3 App Service Plan

 

Average Memory Utilization (in Bytes) - 40K Message Workload

 

Memory Utilization (in Bytes) Per Minute - 40K Messages - WS1 App Service PlanMemory Utilization (in Bytes) Per Minute - 40K Messages - WS1 App Service Plan

 

Memory Utilization (in Bytes) Per Minute - 40K Messages - WS2 App Service PlanMemory Utilization (in Bytes) Per Minute - 40K Messages - WS2 App Service PlanMemory Utilization (in Bytes) Per Minute - 40K Messages - WS3 App Service PlanMemory Utilization (in Bytes) Per Minute - 40K Messages - WS3 App Service Plan

 

90th Percentile CPU Utilization - 100K Messages Workload

CPU Utilization Per Minute - 100K Messages - WS1 App Service PlanCPU Utilization Per Minute - 100K Messages - WS1 App Service Plan

 

CPU Utilization Per Minute - 100K Messages - WS2 App Service PlanCPU Utilization Per Minute - 100K Messages - WS2 App Service Plan

 

CPU Utilization Per Minute - 100K Messages - WS3 App Service PlanCPU Utilization Per Minute - 100K Messages - WS3 App Service Plan

 

Average Memory Utilization (in Bytes) - 100K Message Workload

 

Memory Utilization (in Bytes) Per Minute - 100K Messages - WS1 App Service PlanMemory Utilization (in Bytes) Per Minute - 100K Messages - WS1 App Service Plan

 

 

Memory Utilization (in Bytes) Per Minute - 100K Messages - WS2 App Service PlanMemory Utilization (in Bytes) Per Minute - 100K Messages - WS2 App Service Plan

 

 

Memory Utilization (in Bytes) Per Minute - 100K Messages - WS3 App Service PlanMemory Utilization (in Bytes) Per Minute - 100K Messages - WS3 App Service Plan

Posted at https://sl.advdat.com/3vXU8xChttps://sl.advdat.com/3vXU8xC