Wednesday, August 18, 2021

Azure Customers - ABAP dump DBSQL_SQL_ERROR with error message TCP RESET 10054 on Azure ILB

This blog describes a failure situation that showed up over the last 12 months in some circumstances with customers running DBMS High Availability solutions on Azure that require an Azure Load Balancer. These customers may have observed ABAP Shortdumps with the error message 10054.  We meanwhile identified the problem and described a solution to resolve this problem in a new SAP Note 3083711 - Azure - ST22 shows DBSQL_SQL_ERROR

1.    How do you Diagnose the Problem?

To analyze whether you encountered the problem, follow the procedure below.  All the conditions below must be true for this note to apply:

  1. The DBMS solution must use a Virtual IP Address and the Azure Standard Load Balancer (such as SQL AlwaysOn, Hana HSR architectures)
  2. The DBMS solution must use Operating System clustering.  Examples include Windows Cluster for SQL Server AlwaysOn or FCI or Pacemaker for Hana
  3. The Standard Load Balancer must have explicit tcp port rules configured and NOT HA Ports configured
  4. The ABAP dump DBSQL_SQL_ERROR (or similar) will contain a network level error
  5. The error message in the dump may contain “connection was forcibly closed” or tcp reset or 10054
  6. The ABAP dump will be triggered on a failure of a Secondary Service Connection.  An example of a Secondary Connection is illustrated below.  The “>>>>>” indicates the line of code that triggered the DBSQL_SQL_ERROR

 

The screenshot shows an example of a Secondary Service Connection.  Note: the actual name of the Secondary Connection will be different for different ABAP programs.  The ABAP syntax for a Secondary Service Connection is “connection (<connection name or variable)”. 

Cameron_MSFT_SAP_PM_0-1629245799213.png

The SQL operation will typically be on a very high concurrency table such as NRIV, VARINUM etc

The problem will not occur on the Basic Load Balancer as the Basic Load Balancer runs different logic. Nevertheless, we advice customers that are using Basic Load Balancer to move to the Standard Load Balancer as general guidance due to significant latency improvements, independent of the issue described in 3083711 - Azure - ST22 shows DBSQL_SQL_ERROR.

2.    What Causes this Problem?

The problem is caused by a regression in the how Azure networking handles TCP reset injection.  The problem will only occur with High Availability solutions that use the Azure Standard Load Balancer to present a Virtual IP Address created by either Windows Cluster or Pacemaker. 

3.    How to Resolve this Problem?

The problem can be quickly and easily resolved by applying the following the procedure below.

 

  1. Arrange short downtime for SAP Application
  2. Stop SAP Application servers
  3. Shutdown DBMS using High Availability solution (Windows Cluster or Linux Pacemaker)
  4. Delete port rules from the Azure Standard Load Balancer
  5. Configure “HA Ports”
  6. Save configuration on the Azure Standard Load Balancer
  7. Restart DBMS solution
  8. Restart SAP Application server
  9. Confirm that the SAP Application server to DB server latency is within limits as described in 2931465 - When to use Proximity Placement Groups on Azure to Reduce Network Latency – 3 Tier NetWeaver or S/4HANA architecture
  10. If the ABAP Meter results are close to or below the values in Note 2931465 do not configure Proximity Placement Groups   

 

High availability ports overview in Azure - Azure Load Balancer | Microsoft Docs

 

 

Posted at https://sl.advdat.com/2W3WCuS