Monday, March 21, 2022

Understanding the Azure Kubernetes Service (AKS) on Azure Stack HCI upgrade experience

In Azure Kubernetes Service (AKS) on Azure Stack HCI there are a number of reasons why we encourage customers to update their Kubernetes deployments and some of these include: 

 

  • Updates help fix or remove bugs in prior versions, add new features and improve existing ones. 
  • Updates are important to customer safety and cyber security. The sooner customers update, the sooner they will feel confident that they are secure.   

AKS-HCI follows a monthly release cadence. At each release, the release notes highlight the Kubernetes versions that are supported in that release. This includes minor or patch versions that have been made available from upstream. Kubernetes uses the standard semantic versioning scheme for each release that's published which basically means each version is numbered in the format [major].[minor].[patch]. Minor version releases include new features and improvements while patches are intended for critical bug fixes within a minor version or security vulnerabilities. We need to distinguish between the following two types of updates in AKS-HCI: 

 

  • Updates – are fixes for security vulnerabilities and blocking functional deficiencies. For example, when a fix is developed for an issue in one of the components, this particular component is updated and released leaving all other components unchanged in AKS-HCI. 
  • Upgrade - normally results in the platform moving from one major release to the next major release. For example, a move from AKS-HCI 1.0 to AKS-HCI 2.0 is an upgrade. 

When updating an AKS-HCI cluster, there are three layers to consider, the PowerShell modules, the management cluster and the workload cluster as shown in the image below. All nodes in the cluster including the control plane, load balancer and worker nodes run as virtual machines (VMs) with either Mariner Linux OS or Windows Server Core OS if you are running windows worker nodes.  

 

The PowerShell Modules 

  

When a new release is available, it's published in the PowerShell gallery. You simply pull this down by running the Update-Module command to install the latest modules. If you are using Windows Admin Center, you will see a notification to update the Azure Kubernetes Service extension.  

  

The management cluster 

  

The management cluster (also called the AKS host) provides the core orchestration mechanism for deploying and managing workload clusters. Once you have the PowerShell modules installed, updating the AKS host is straightforward, run the command Update-AksHci and that will begin the orchestrated upgrade to the latest version of AKS-HCI.  should get you the latest version.  

  

Note: The administrator cannot select a specific Kubernetes version for the management cluster. We will always update it to the latest version as soon as the administrator initiates the update of one workload cluster.  

  

The workload clusters 

  

Workload clusters (also known as target clusters) on the other hand are used for running containerized workloads. When the AKS host is fully updated, you can then start workload cluster updates. You have the option to update just Kubernetes version, OS patch or both in the same command. However, for the AKS host to be updated, the workload cluster must be running on a supported minor version in the release. It’s not possible to downgrade the workload cluster once it has been updated to the latest version. 

 

 

How upgrade works 

  

Upgrading from version (N-2 or N-1) to version N (most common) 

  

Upgrading from version N-2 or N-1 to version N is normally straight forward. Although it requires your target cluster to be running a supported minor version release N. Every release provides an overlap between minor Kubernetes versions so that an upgrade path between releases is always available.  

  

When you trigger an update of AKS-HCI host or a workload cluster upgrade, we perform what is called a rolling update. The old version VM will be cordoned off in Kubernetes to avoid workloads being deployed to it, then the VM will be drained of all workload containers to distribute the containers to other worker nodes in the system. The old version VM will then be removed from the Kubernetes cluster, shut down and replaced by a VM with the new version.  

  

Tip: We recommend that customers perform capacity planning for their clusters to ensure that enough storage, memory and extra IP addresses are available to allow new VM creation.  For more information about the requirements, see the public documentation

  

Once the new VM has been successfully provisioned and joined to the cluster, the old VM is drained - Kubernetes restarts the pods on other available nodes, and finally deletes the VM. The upgrade process will then continue to the next node in the queue. This process will be repeated until all VMs are updated. 

 

Note: that if your application is running multiple pod replicas spread across all nodes, then you shouldn’t experience any downtime. If your application is running as a singleton (one pod instance), then there will be some downtime as Kubernetes restarts the pod on available nodes. For more information about application availability, see this article

 

Upgrading from ‘older versions’ 

 

The Kubernetes community announced that starting with Kubernetes version 1.19, the support window for Kubernetes versions will increase from 9 months to one year. The community normally releases minor versions roughly every three months. This means that minor version updates will come further apart than before and will be supported for much longer periods. However, patch versions will still be released every 2-3 weeks. In AKS-HCI, we use older, outdated or unsupported versions to refer to clusters running on a version that's older than 60 days from the latest release.

  

Note: If customers are running an older Kubernetes version, they will be asked to upgrade when requesting support for the cluster. Clusters running unsupported Kubernetes releases are not covered by the AKS-HCI support policies

When you upgrade from an older version, the main goal of the upgrade process is to ensure your management cluster is upgraded to the latest version with zero (or minimal) manual intervention. The upgrade process will be performed in a stepped fashion, automatically downloading and installing each interim version you skipped until finally your management cluster is running on the latest version. However, the process is seamless only if your target cluster is running on a minor version that is supported in the release you’re trying to upgrade to (typically the latest release). One of the first actions the upgrade process checks is that the workload clusters are not running on minor versions that have been deprecated. 

  

Note: If your target cluster is running on a minor version that is not supported by the new release, you must manually upgrade the target cluster to a minor or patch version supported by this release before proceeding with the upgrade.  

 

In this example, we have AKS-HCI running on the June Update (version 1.0.1.10628) and have a workload cluster running Kubernetes version v1.18.17. We want to upgrade to the September Update (version 1.0.4.10928) which has v1.18.17 already deprecated.  

The output shows some recommended steps before we can proceed with this upgrade i.e., before upgrading AKS-HCI the workload cluster must be upgraded from v1.18.17 to v.1.19.9 so that it has a path to the September version.  

 

PS C:\> Get-AksHciUpdates | ConvertTo-Json   {  "1.0.4.10928": {  "Comments":"This is the LATEST Version",  "SupportedKubernetesVersions": [  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.19.9; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.19.11; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.20.5; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.20.7; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.21.1; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.21.2; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.19.9; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.19.11; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.20.5; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.20.7; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.21.1; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.21.2; OS=Windows; IsPreview=False}"  ],  "CanUpgradeTo":false,  "Version":"1.0.4.10928",  "Recommendation":"Workload Cluster Kubernetes Version v1.18.17 is not in the list of supported Kubernetes versions (v1.19.9 v1.19.11 v1.20.5 v1.20.7 v1.21.1 v1.21.2) for 1.0.4.10928. Please upgrade your target clusters to one of the kubernetes versions supported by 1.0.4.10928 to unblock"  },  "1.0.3.10901": {  "SupportedKubernetesVersions": [  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.19.9; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.19.11; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.20.5; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.20.7; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.21.1; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.21.2; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.19.9; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.19.11; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.20.5; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.20.7; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.21.1; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.21.2; OS=Windows; IsPreview=False}"  ],  "CanUpgradeTo":false,  "Version":"1.0.3.10901"  },  "1.0.2.10723": {  "SupportedKubernetesVersions": [  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.19.9; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.19.11; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.20.5; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.20.7; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.21.1; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.19.9; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.19.11; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.20.5; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.20.7; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.21.1; OS=Windows; IsPreview=False}"  ],  "CanUpgradeTo":false,  "Version":"1.0.2.10723"  },  "1.0.1.10628": {  "Comments":"This is your CURRENT Version",  "SupportedKubernetesVersions": [  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.18.14; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.18.17; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.19.7; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.19.9; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.20.2; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.20.5; OS=Linux; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.18.14; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.18.17; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.19.7; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.19.9; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.20.2; OS=Windows; IsPreview=False}",  "@{OrchestratorType=Kubernetes; OrchestratorVersion=v1.20.5; OS=Windows; IsPreview=False}"  ],  "CanUpgradeTo":false,  "Version":"1.0.1.10628"  }  } 

 

 

Resolving failures during upgrades

 

Sometimes an upgrade may be interrupted by an involuntary disruption such as a hardware failure, or virtual machine crashing, etc. When such an event occurs, there are two scenarios to be aware of:

 

Revertible stage

 

We describe the revertible stage as the stage in a rolling update before the new appliance VM has been created. Any failure that occurs at this stage will revert the update operation to the previous stable state. This typically happens in the first 1-5 minutes of the upgrade process until the new appliance VM has been created. Simply rerun the update command to retry upgrade.

 

Non-revertible stage

 

We describe this stage as the "point of no return or exit". Any failure that occurs could cause the cluster to fall into an inconsistent state and potentially cause impact on cluster operations. In this state, the upgrade process hangs and requires intervention or support to fix the issues. Typically, if a customer upgrade hangs for more than 30-40 minutes, we recommend contacting support.

NOTE: there is currently no support for rollback of an update in Cluster API. There is a feature proposal under discussion now to close this gap. As soon as that is closed on and in the code, we will pick up the version and implement rollback.   

 

Support

 

Microsoft recommends that customers upgrade their on-premises clusters within 60 days (about 2 months). Support for a release typically ends 60 days after its RTM. After 60 days, support provided to customers is limited; and would mostly be in the form of helping the customer get to the latest build. A build is expired once there are no more customers running on that build. This would occur at the next release. There is no way to recover a build once it has expired.

  

Conclusion

To learn more about high availability on AKS-HCI, please visit our documentation for a range of topics.

 

Posted at https://sl.advdat.com/3woFpMxhttps://sl.advdat.com/3woFpMx