Tuesday, March 15, 2022

Conflict between API Management stv1 and NAT Gateway

Symptom:

After deploying your APIM into a virtual network and trying to visit the APIM from Azure Portal, you may sees such error “cannot connect to management endpoint via port 3443”

Moreover, if you visit the “Resource health” blade of APIM, you will see your APIM is "unavailable".

This Blog introduces one of the typical scenarios of how this issue occurs. 

 

APIM underlying health check mechanism:

Each APIM has 3 major endpoints:

  1. Management endpoint
  2. Gateway endpoint
  3. Developer portal endpoint

Shuai_Hao_0-1647324026364.png

Please Note:

  • APIM Control Plane will only send health check requests regularly to the management endpoints to monitor the health status of each API management if the API Management is in Internal VNET mode. If the control plane cannot get the response back, it will consider the API Management service as down.
  • In Non-Vnet or External Vnet mode, APIM Control Plane will send health check requests to All Three endpoints to check the health status of API Management. 

Therefore, if API Management control plane has connectivity issue with the APIM’s management endpoint, the above symptom will show up.

 

 

Introduction of conflict between APIM stv1 and NAT Gateway:

 

Why is the NAT Gateway related to causing APIM “down”? And how NAT gateway will affect the above APIM health checking mechanism?

 

If you have NAT gateway located at APIM’s subnet, when the APIM control plane send a health check request to API Management’s management endpoint, the traffic will be:

 

  1. APIM control plane sends health check to APIM Management endpoint
  2. APIM returns the response to NAT Gateway
  3. Nat Gateway forward the response back to APIM control plane

Shuai_Hao_1-1647324071381.png

However, if APIM is in STV1, the requests will fail. Please see the root cause below:

 

Root Cause:

According to NAT Gateway’s official document, NAT is compatible with standard SKU public IP addresses or public IP prefix resources or a combination of both. You can use a public IP prefix directly or distribute the public IP addresses of the prefix across multiple NAT gateway resources. NAT will groom all traffic to the range of IP addresses of the prefix. Basic resources, such as basic load balancer or basic public IPs aren't compatible with NAT. Basic resources must be placed on a subnet not associated to a NAT Gateway. Basic load balancer and basic public IP can be upgraded to standard to work with NAT gateway.

Reference: https://docs.microsoft.com/en-us/azure/virtual-network/nat-gateway/nat-overview#virtual-network-nat-basics

 

API Management components:

APIM has 2 types of underlying compute platform:

APIM Version

Underlying compute platform

Public IP Tier

Load balancer Tier

Stv1

Cloud Service

Basic

Basic

Stv2

VMSS

Standard

Standard

 

Therefore, if you are using stv1, your APIM is equipped with a basic tier public IP and a basic tier Load balancer. And it will cause the outbound traffic with NAT failure. Therefore, all the traffic to internet will be blocked due to the NAT failure as the diagram below:

 

Shuai_Hao_2-1647324230678.png

 

Solution:

  1. Remove the NAT gateway from the APIM’s subnet.
  2. Upgrade the APIM into STV2.

 

Additional information:

An additional question is, how can we identify which version of compute platform is our APIM using?

Shuai_Hao_0-1647334812400.png

 

 

 

 

Meanwhile, there is another blog written by our engineer Hailey Ding which introduces the differences between 2 APIM compute platform version and how to upgrade the APIM from stv1 to stv2:

Compute Platform Versions for Azure API management service - Microsoft Tech Community

 

 

 

 

 

Posted at https://sl.advdat.com/3CKC5Ndhttps://sl.advdat.com/3CKC5Nd