Tuesday, August 31, 2021

Migrate VMs behind Standard Load Balancer to another region with Azure Site Recovery

 

Inquiry from customer

 

My customer asked me about the following topic.

 

We have a system which consists of Azure Load Balancer and two VMs behind the load balancer. To meet our rules around BCDR (business continuity & disaster recovery), we would like to migrate this system with Azure Site Recovery (ASR), but the issue of “Site Recovery configuration failed (151196)” happened prevents us from configuring ASR. What is the root cause? Do you have any workarounds or solutions?

As this inquiry is not clear for me, I asked them to elaborate the condition and issue.

  • They use Standard Load Balancer.
  • ExpressRoute is connected between their on-premise environment and Azure, and forced tunneling is configured.
  • Their application running VMs uses Table storage as a data source. They have already configured Service Endpoint for Table storage.
  • As state is not shared between VMs, simply migration from one VM to another is required.

The following diagram seems to reflect customer’s environment.

image-23[1].png

 

VNet connected to ExpressRoute is not Hub network, so integration between ExpressRoute and Site Recovery, which the following URL describes, is not required in this case.

 

Integrate ExpressRoute with disaster recovery for Azure VMs

https://docs.microsoft.com/azure/site-recovery/azure-vm-disaster-recovery-with-expressroute

 

Cause

If you are familiar to Azure, you would detect the root cause at once.

Standard Load Balancer prevents VMs behind the load balancer from accessing outside located VNet. So, configuration for accessing ASR related resources outside VNet is required. Indeed forced tunneling is configured, but this configuration does not work behind Standard Load Balancer.

 

This is mentioned in the document.

 

Issue 2: Site Recovery configuration failed (151196)

https://docs.microsoft.com/azure/site-recovery/azure-to-azure-troubleshoot-network-connectivity#issue-2-site-recovery-configuration-failed-151196

 

If the VMs are behind a Standard internal load balancer, by default, it wouldn’t have access to the Microsoft 365 IPs such as login.microsoftonline.com. Either change it to Basic internal load balancer type or create outbound access as mentioned in the article Configure load balancing and outbound rules in Standard Load Balancer using Azure CLI.

ASR needs access to Active Directory Services such as login.microsoftonline.com, but configuration for accessing such services was not done. Forced tunneling lets you redirect or “force” all Internet-bound traffic back to your on-premises location, and default gateway is advertised from on-premise side. However, forced tunneling does not work for VMs behind Standard Load Balancer.

 

Outbound connectivity

Outbound connectivity from VMs is listed below. These are required when replicating VMs with Azure Site Recovery.

Storage *.blob.core.windows.net
Azure Active Directory login.microsoftonline.com
Replication *.hypervrecoverymanager.windowsazure.com
Service Bus *.servicebus.windows.net

 

This is mentioned in the following document.

 

Troubleshoot Azure-to-Azure VM network connectivity issues

https://docs.microsoft.com/azure/site-recovery/azure-to-azure-troubleshoot-network-connectivity

 

Solutions

We have the following options to establish outbound connectivity required for replicating VMs with Azure Site Recovery.

  1. Replace Standard Load Balancer with Basic Load Balancer.
  2. Assign public IPs to VMs behind Standard Load Balancer.
  3. Assign NAT Gateway to subnet where VMs connect.
  4. Add Public Load Balancer and configure outbound rule from VMs.
  5. Add Azure Firewall, configure UDR (User defined route) to route 0.0.0.0/0 to Azure Firewall, and set UDR to the subnet where VMs connect.
  6. Use Service Endpoint and Private Endpoint to open routes to required services.

 

1. Replace Standard Load Balancer with Basic Load Balancer.

Basic Load Balancer permits VMs behind load balancer to connect outside VNet, while Standard Load Balancer doesn’t.

 

Azure Load Balancer SKUs

https://docs.microsoft.com/azure/load-balancer/skus

 
image-22[1].png

 

When forced tunneling is configured, replication traffic leaves the Azure boundary (i.e. is gone to the Internet). As the following document says, this configuration is not recommended. If forced tunneling is not used, it is okay.

 

Forced tunneling

https://docs.microsoft.com/azure/site-recovery/azure-to-azure-about-networking#forced-tunneling

 

2. Assign public IPs to VMs behind Standard Load Balancer.

Public IPs are assigned to both VMs to access directly outside VNet.

image-21[1].png

 

This solution means not only outbound traffic from VMs goes but also inbound traffic to VMs from outside VNet comes. So, the following configuration is mandatory.

  • NSG (Network Security Group) should be configured to manage inbound/outbound traffic.
  • It is simpler to assign NSG to subnet where VMs connect than to assign NSG to each NIC of VM.

In this case, all traffic between VMs and Azure Services are in Microsoft Network and not gone to the Internet.

 

3. Assign NAT Gateway to subnet where VMs connect.

Instead of assigning public IP addresses to VMs, NAT gateway is assigned to the subnet where VMs connect.

image-17[1].png

 

NAT gateway works for outbound access and inbound traffic cannot use public IP address(es) assigned to NAT gateway. So, NAT gateway prevents VMs to being accessed from outside VNet. In this case, all traffic between VMs and Azure Services are in Microsoft Network and not gone to the Internet.

 

Virtual Network NAT Documentation 

https://docs.microsoft.com/azure/virtual-network/nat-gateway/

 

4. Add Public Load Balancer and configure outbound rule from VMs.

Public Load Balancer and outbound rule allow us to configure to permit outbound traffic from VMs behind the load balancer.

image-18[1].png

 

This solution is similar to the 2nd and 3rd solution, but this is the most expensive than the 2nd and the 3rd. All traffic between VMs and Azure Services are in Microsoft Network and not gone to the Internet.

 

5. Add Azure Firewall, configure UDR (User defined route) to route 0.0.0.0/0 to Azure Firewall, and set UDR to the subnet where VMs connect.

Azure Firewall allows us to managed inbound/outbound traffic from/to VMs. And default route of the subnet where VMs connect is changed to Azure Firewall with UDR (User Defined Route).

image-19[1].png

Azure Firewall allows us to manage inbound/outbound traffic with not only IP address(es) and FQDN but also FQDN, while NSG does not with FQDN. All traffic between VMs and Azure Services are in Microsoft Network and not gone to the Internet.

Indeed Azure Firewall is powerful, but this solution is the most expensive of all solutions...

 

6. Use Service Endpoint and Private Endpoint to open routes to required services.

Instead of assigning public IP address(es) to either VMs or the subnet, routes to services required for ASR replication are opened with Service Endpoint and Private Endpoint.

image-25[1].png

The following document describes how to enable replication with private endpoints.

 

Replicate machines with private endpoints

https://docs.microsoft.com/azure/site-recovery/azure-to-azure-how-to-enable-replication-private-endpoints

 

Services required for ASR replication and what option(s) are acceptable are listed below.

  • Azure Active Directory: Service Endpoint only
  • Service Bus : Service Endpoint only (As destination is not clear, Service Endpoint is the only option.)
  • Storage Service: Either Service Endpoint or Private Endpoint
  • Recovery Service Container: Private Endpoint Only

This solution is ideal thanks to the following reasons.

  • All traffic does not leave Azure boundary and is kept secure.
  • No public IP addresses is required.
  • Cost effective.

Note the following points when configuring this solution.

  • Depending upon storage account SKU (premium or standard) used for cache storage, storage account roles to be granted to managed identity of Recovery Service Container varies.
Storage SKU Roles to be granted
Standard Contributor
Storage BLOB Data Contributor
Premium

Contributor

Storage BLOB Data Owner

 

  • In the URL above, configuring private endpoint to cache storage is optional. In this case, however, we have to configure Private Endpoint or Service Endpoint to cache storage as VMs are behind Standard Load Balancer.

Summary and customer decision

We have several options to solve this situation and each option has pros/cons. After explaining these options to the customer, they made a decision to choose option #6.

  Does traffic leave Azure boundary?  Is public IP needed?  Cost Configuration points Remarks
1 Yes in some cases. No Outgoing traffic cost might increase. On-premise firewall rules In case of using forced tunneling, storage replication traffic goes to the Internet.
2 No Yes   NSG
(Inbound/Outbound)
 
3 No Yes   NSG
(Especially outbound)
 
4 No Yes   Public Load Balancer
(Outbound rule)
 
5 No Yes Azure Firewall is expensive.

UDR

Firewall rules

 
6 No No   Grant role to managed identity of recovery container.  

 

Posted at https://sl.advdat.com/3zA9F6a