Wednesday, June 30, 2021

Cross-region data replication using rsync

Customer Challenge

The customer wanted to use Azure NetApp Files (ANF) for their SAP app-tier shared storage (i.e. /sapmnt, /usr/sap/SID/SYS, etc.) but not for HANA database data. They have their primary instances in US East, and their disaster recovery environment in US West 2. Normally we would use ANF cross-region-replication (CRR) to replicate between these regions; unfortunately ANF CRR doesn’t support replication between two different subscriptions at this time – this customer is using a different subscription in each region.

 

Potential Solutions

There are several potential solutions for this, including NetApp CloudSync and Linux rsync; the solution we decided on is to use rsync since it is included with Linux, and we were on a very short timeframe for this project. rsync is a very versatile file copying tool that can copy between directories or volumes on a single host, between two hosts over ssh, or to a remote rsync daemon. It uses a "delta-transfer" algorithm that sends only the differences between the source files and the files in the destination.

 

One downside of the configuration described below is that we need a VM in each region to run and receive the rsync replication data, since ANF does not support mounting volumes located in one region from VMs in another region. The two machines need to be able to communicate over the network – in this case the two regional vnets were connected via Azure global vnet peering. If the volumes were in the same region, we would be able to mount the volumes to a single VM and use rsync for the data transfer.

 

We considered using one of the existing machines in the architecture (eg. the ERS machine) to do the replication, but that would increase complexity on those machines. We decided to use a dedicated virtual machine in each region to support this replication. Each VM mounts the ANF volume(s) in their own region, and use the rsync command between the VMs to do the actual data replication.

 

There are two ways that rsync can actually replicate the data in this scenario:

  • Over the ssh protocol. This requires setting up ssh keys so that the replication user can ssh between the machines without a linux password.
  • By connecting to a remote rsync daemon (i.e. Linux system process). This requires setting up & managing the rsync system service daemon.

We decided on the first option, since ssh was already set up for their configuration management system (eg. chef).

 

Solution Caveats

  • rsync is a file-level copy/replication solution (as opposed to real-time block level replication) and operates periodically - it will traverse through all of the files in the replicated directories or volume and copy the different or new files to the destination volume. Due to this process, there will clearly be some time delay between the time a file is written and when the file appears on the destination volume.

  • rsync is single threaded - this will limit the overall throughput between the two different volumes/VMs. This wasn't a concern for this particular application, but it would be wise to test throughput in your own scenario. A very simple way to address this limitation would be to configure rsync to run on specific subdirectories of the volume, rather than the root directory. This would allow those rsync processes to run in parallel.

  • The first time rsync is run on the volume will take significantly longer than subsequent runs, due to the initial data transfer.

Solution Configuration

In our case, here are the mounts that we set up (for initial testing of the solution):

Region Virtual machine Mount on virtual machine
US West 2 anf-client-west2 (primary) /vol-west2
US East anf-client-east (replica) /vol-east

 

To actually copy the data, we used this command on the primary anf-client-west2 machine:

rsync -azP --delete --exclude=.snapshot --log-file=/var/log/rsync.log  /vol-west2/ root@anf-client-east:/vol-east

The options we used above are these:

option description
-a Archive mode – rsync will do a recursive copy, and preserve modification times, links, file ownership and permissions.
-z Compress data over the network
-P keep partially transferred files, and show the progress during transfer
--exclude=.snapshot Exclude the ANF .snapshot directory
--log-file=/var/log/rsync.log Create log file in /var/log/rsync.log

 

To schedule rsync via chron, we put this in the root cron configuration using the sudo crontab -e command, which lets you edit the root crontab:

* * * * * rsync -azP --delete --exclude=.snapshot --log-file=/var/log/rsync.log /vol-west2/ root@anf-client-east:/vol-east

The initial asterisks tell cron to run this every minute – this may be excessive depending on requirements.

To run every 5 minutes, this would be the configuration:

*/5 * * * * rsync -azP --delete --exclude=.snapshot --log-file=/var/log/rsync.log /vol-west2/ root@anf-client-east:/vol-east

For a more complete solution, it is recommended to run rsync from a shell script that checks whether rsync is running already, for example:

#!/bin/bash
lockfile=/var/anf-sync/lockfile
mkdir -p /var/anf-sync
if test -f "$lockfile";
then
       echo "rsync currently running, exiting"
       exit
else
       touch "$lockfile"
       rsync -azP --delete --exclude=.snapshot --log-file=/var/log/rsync.log /vol-west2/ root@anf-client-east:/vol-east
       rm "$lockfile"
fi

Of course in an actual DR event, the replication would have to be stopped, and (presumably) resumed in the other direction. This should be included in the DR runbook.

NFS Volume Consolidation

When using the ANF for NFS volumes, the customer wanted to optimize the volume size, performance and the overall cost. For this scenario, we had less than 100GB (the minimum volume size) for each SAP SID. For that reason we suggested consolidation in the manner documented here.

 

The ANF volume path is <IP Address>:/vol-west2. In that volume we will create a directory for each SID (in this example, QAS and NW1) , and under each of those there will be an ASCS, ERS, sapmnt and SYS directory. These directories have to be created via a VM, after the volume is created. Here are the sample directories that we created:

<IP>:/vol-west2/usrsapQAS/sapmntQAS
<IP>:/vol-west2/usrsapQAS/sapmntQASascs
<IP>:/vol-west2/usrsapQAS/sapmntQASsys
<IP>:/vol-west2/usrsapQAS/sapmntQASers
 
<IP>:/vol-west2/usrsapNW1/sapmntNW1
<IP>:/vol-west2/usrsapNW1/sapmntNW1ascs
<IP>:/vol-west2/usrsapNW1/sapmntNW1sys
<IP>:/vol-west2/usrsapNW1/sapmntNW1ers

These directories would be mounted either by the mount command, automounter configuration or the cluster filesystem resource configuration. There are really three differences (that I can think of) between this and having a separate volume for each mount:

  • The export policy would be for the volume as a whole, so all of the VMs for all SIDs using the volume would need access. There was some concern that this reduces security across SIDs somewhat. However, the root user is the one that can mount volumes, and root should be trusted.
  • If one of the SIDs were to fill up the volume, it could impact the others. It would be wise to put in monitoring or processes to grow the volume when needed.
  • The performance tier/quality of service will be for the consolidated volume. Since these aren’t used all that much this should give better performance overall, but it would be possible for one SID to consume all of the IOPS/throughput, causing impacts on the others.

This is an example for the QAS instance – replace with the actual ip address for your volume, in the /etc/auto.direct file:

/sapmnt/QAS -nfsvers=3,nobind <anf-vol-ip addr>:/vol-west2/usrsapQAS/sapmntQAS
/sapmnt/QAS/SYS -nfsvers=3,nobind <anf-vol-ip addr>:/vol-west2/usrsapQAS/sapmntQASsys

Single region configuration

For scenarios that are within a single region, the configurations above will work fine - however it's also possible to mount both the source and replica volumes from a single Azure VM, and use rsync on that VM to replicate the data between the two volumes.

Mounting Options

For mounting NFS volumes on Linux VMs, it is preferred to use either the Linux automounter, or to use cluster FS resources (when applicable). This is recommended because there is a timing issue in the Linux boot process where the /etc/fstab can sometimes be processed before the network stack is fully available. If the /etc/fstab is used to mount the NFS volumes on boot, it is possible for the boot to hang, or for the VM to boot and have the NFS mounts fail. This happens intermittenly, and it isn’t a customer specific situation.

For systems in a cluster, there are two advantages to having the volumes be cluster filesystem resources:

  • The fstab and network availability issue discussed above is resolved, because cluster resources would always be started after cluster communications have been established.
  • The resource agent for a cluster filesystem resource agent monitors the availability of the mounted volume.

However, for systems that are not in a cluster, the automounter will mount the desired volumes on demand, rather than at boot time.

 

Also, when preparing the mount point directories, it's important to use the chattr +i <mountpoint> command - this will make the actual mount point immutable, so that any attempted writes to the mountpoint will fail if the NFS volume is not mounted on top of it.

Posted at https://sl.advdat.com/361qb1V