Friday, March 11, 2022

Azure Hybrid Cloud Lab Environment

Problem

During pandemic, all schools go online. Students need to use their own PC at home, and they are facing the following problems:

  1. Inconsistent Leaning Environment - All student’s PC are in difference from hardware, OS type (Mac, Windows 7/10/11), software. Teachers cannot teach IT courses easily and unknown problems keep happening. Most of the time in the lesson is helping students to debug the platform.
  2. Software Licensing – Schools cannot give the license key to students and installing the software at home. In the past, students are sharing the lab and the number of licenses is the number of lab computer, but now, this model is not feasible to buy license to let students install all software at home.
  3. Cloud not-runnable software – some courses need to teach students install operation system or hypervisor to create private cloud. It is extremely hard or expensive to do that in cloud. Student’s PC sometimes cannot run hypervisor simply i.e. virtualization technologies is disabled or not support, or resource limitation.
  4. Direct Remote solution – all the existing solution assumes the PC is owned by one person, but lab computers require to be shared by different students in difference period. For example, through VPC, a student can keep using all computers and no one else can login.

 

Solution

Azure Hybrid Cloud Lab Environment is an Serverless Azure IoT Application to establish private remote channel to let students use the physical lab computer at home. The physical lab is consistent with same hardware, authorized software, and full support to running virtualization software. It builds on top of Azure Cloud Lab Environment project and manages the lab computer private remote channel according to the class schedule.

 

How does the private remote channel work?

cyruswong_0-1646814906442.png

 

In Lab computer, a .net 6 windows services (IoTSshClientService) connects to the SSH bastion container and creates remote port forwarding. The equivalent SSH command is “ssh -R 3389:localhost:3389 <SSH bastion ip>”.  It bridges the Lab computer remote desktop service to SSH bastion container port 3389

For student home PC, it connects to the SSH bastion container and creates local port forwarding. The equivalent SSH command is “ssh -L 3389:localhost:3389 <SSH bastion ip>”. It bridges the student home PC remote desktop port 3389 to SSH bastion container port 3389. We recommend students to use Bitvise SSH Client, and it uses dynamic local port so it will not crash if student home PC is running remote desktop server. Also, students can have one click connect to remote desktop by default.

cyruswong_1-1646814906445.png   cyruswong_2-1646814906445.png

IoTSshClientService is an .net 6 widows service and connects to Azure IoT hub and manage SSH connection to SSH bastion container. IoT hub sends message to it, and it changes SSH connection to different bastion for difference students. SSH bastion container is ephemeral, and it only exists for the class period per students. As a result, disconnect SSH connection and destroy the bastion SSH container can make sure students only be able to access the lab PC during their lab class session and the lab PC will connect to a new SSH bastion for another student.

For VNC, students need to add client to server port forwarding.

cyruswong_0-1646889774224.png

 

And connect it through localhost.

cyruswong_1-1646889774228.png

The VNC server needs to enable local loopback.

 

Video demo

How to remote your lab PC from Windows? 

How to remote your lab PC from Mac?

How to remote your lab PC from Android?

For lab of MacOS computer, you needs to enable VNC remote control and run the osx-x64 version client. 

Setup and demo remote Lab MacOS Computer over SSH Tunnel with VNC client

 

Architecture

Azure Hybrid Cloud Lab Environment.drawio.png

 

GetDeviceConnectionStringFunction

Each lab computer installed and is running IoTSshClientService. During the startup time, it will send a post request to the HTTPS endpoint with function key authorization. If it is new computer, it will register a new computer, create a new device in Azure IoT Hub, and add tags to devices Twins such as Location and Mac Address. It is not a new computer; it retrieves the registered computer record. For both cases, it returns the unique device connection string to IoTSshClientService. The reason of this design is to simply deployment as it is hard to manage deployment package per lab PC. IoTSshClientService implemented the exponential backoff when the API is in error. The computer information is stored in Azure table – Computer.

AddSshConnectionFunction

It serves as the lifecycle hook in https and function key authorization for the event of CREATED, DELETING, and DELETED from Azure Cloud Lab Environment of SSH Bastion. For DELETING or DELETED event, it sets computer reservation to false, set the desired “session” to empty string in device twin which makes sure the IoTSshClientService or lab PC will close the SSH connection even if it is offline now, call the cloud-to-device-method OnRemoveSshMessage to immediately disconnect the SSH connection if IoTSshClientService is online. For DELETING case, it sends email to students and inform them his Lab PC is no longer available.

For CREATED event, it saves SSH Connection in Azure table and add the SSH Connection to allocate-pc storage queue.

RunAllocatePcFunction

It is triggered by allocate-pc queue message. It gets the lab location from SSH connection and search for a free lab PC by the query "PartitionKey eq '{location}' and IsOnline eq true and IsReserved eq false".

If there are some free lab pc,

  1. random pick one
  2. update computer record with optimistic concurrency - IsReserved to true and email to student email address from SSH connection.
  3. Call the cloud-to-device-method OnNewSshMessage to immediately update SSH connect in IoTSshClientService if it is online, and then set twin desired of “session” to SSH Session which enable Lab PC to reconnect the SSH session if it is rebooted.
  4. If step 3 successes, update SSH Connection Status to “ASSIGNED” and email inform student with the SSH connection settings. If step 3 does not success, its rollbacks the Step 2, throw NoFreePcException which causes RunAllocatePcFunction to retry it after exponential back of from 1 to 5 minutes for 3 times. The back off and retry mechanism is to handle the case of PC reboot or offline case.

If there is no free lab pc, throw NoFreePcException.

RunAllocatePcPoisonFunction

It is triggered by allocate-pc-poison queue message. After several retries of RunAllocatePcFunction, the SSH Connection add to allocate-pc-poison storage queue. It updates SSH Connection STATUS to "NO_PC_AVAILABLE"n and send email to student.

IotHubTriggerFunction

Azure IoT hub routes device message to Azure Event Hub which triggers IotHubTriggerFunction to run when Lab PC state changed. It extracts device ID from event message, get the device twins from Azure IoT registry. If twin does not find, takes no action as the device has been removed from IoT Hub. From twin, get back location and mac address from tags and get back computer record from Azure table. If computer does not exit, take no action as computer is deleted from the system. There are 3 types of operation:

  1. Device connected – update computer record IsOnline to true.
  2. Device disconnected - update computer record IsOnline to false and IsConnected to false.
  3. Update Twin – check 2 Twin reported properties “isConnected” and “lastErrorMessage” and update the computer record. And, error message saves in ComputerErrorLog Azure table.

IoTSshClientService

Some details have been discussed in the previous section. It is a long run loop with dynamic delay for each iteration. The delay increase if there is any error from Azure IoT hub or GetDeviceConnectionStringFunction. It reports the SSH connection status and error message through device twin reported properties. Follow the twin desire “session” and maintain SSH Connection.

 

Deployment

For Azure side, the project is using Terraform and it just need one line of command if you already setup Azure CLI and Terraform.

For Lab computer, first you need to add appsettings.json by renaming appsettings.template and replace the value of AzureFunctionBaseUrl, AzureFunctionBaseUrl, and Location which is the location for your calendar event.

cyruswong_4-1646814906462.png

 

Build debug or release and don’t publish it as it has an unknown bug, and the published DLL cannot start at this moment. Copy all files to Lab PC and create windows service.

For the last step, there are 2 ways:

  1. Manually copy file to each lab computer and run “install.ps” in administrator PowerShell.
  2. If your lab pc has SSH connection support, you can use Ansible scripts -InstallIoTSshClientService.yaml (Recommended).

 

Operational Problems:

  1. The system relies on the recovery solution for Lan PC and for my campus, all lab PC will be restored to a snapshot with reborn card.
  2. Power on and off Lab PC – you need to check wake on Lan solution.
  3. If wake on Lan does not support in your lab pc, then you may try setup the BIOS to wake on eight and use Ansible to remote shutdown them every night.
  4. Students’ shutdown PC and no wake on Lan, you can reduce this chance to remove the shutdown and sleep button to prevent student shutdown or sleep lab pc remotely. You can use Ansible scripts HideShutdownAndSleepButton.yaml.
    fromcyruswong_5-1646814906464.png to cyruswong_6-1646814906467.png

Source Code

Azure Cloud Lab Environment

https://github.com/wongcyrus/AzureHybridCloudLabEnvironment

Azure Cloud Lab Environment

https://github.com/wongcyrus/AzureCloudLabEnvironment

Example AzureCloudLabInfrastructure Repo

https://github.com/wongcyrus/AzureCloudLabInfrastructure

  1. main branch is a simple demo to create resource group.
  2. windows11 branch is running a windows11.
  3. bastion branch is a SSH Bastion in Azure container instance, and it is used by Azure Hybrid Cloud Lab Environment project.

Conclusions

Educational institutions have invested and used to physical computer labs for many years. Although schools can migrate to cloud lab way and provide tailor made Virtual Machine for each student, it may not be good fit for all courses for the reason of software licensing, virtualization, and costing. In fact, schools must prepare for both online and face-to-face lab class in the long run, and hybrid way is the direction. It wastes the resources and investment in our physical lab if students cannot use the lab computer at home. Hope this project can help schools to have a good utilization for physical lab computer during the pandemic.

Project collaborators include, Andy Lum, Jerry Lam, Fong Ho Luen, Jenny Nga, and Wina Yu from the IT114115 Higher Diploma in Cloud and Data Centre Administration

About the Author

cyruswong_7-1646814923643.jpeg

 

Cyrus Wong is the senior lecturer of Department of Information Technology (IT) of the Hong Kong Institute of Vocational Education (Lee Wai Lee and he focuses on teaching public Cloud technologies. He is one of the Microsoft Learn for Educators Ambassador.

Posted at https://sl.advdat.com/3tORZSthttps://sl.advdat.com/3tORZSt