Advanced Data Solutions : Infrastructure as Code (IaC): Comparing the Tools

When you go to deploy a server or any part of our infrastructure manually, how long does it take you? Can you do a manual deployment end to end without any mistakes? Now, how do you scale that? This is where automation comes in, more specifically Infrastructure as Code (IaC).

In many of the companies I've worked for it would take days for a server to be deployed, why? Because there was a ‘process’ and a physical paper checklist that had to be followed, signed off, and checked again. Each person had to complete their task(s) and get them signed off. To get a server deployed, you'd have to configure the VM and the host (networking, storage, etc), each server required an image to be deployed, patch the OS, harden the deployment, then install/configure an application. Once that was all done… the server was ready for sign off and handed over to the customer. That took 3 days.

In some of the environments I managed, I could automate most of a complete server/infrastructure deployment in a few hours, it was still a very manual process, mistakes were often made. This is when I discovered Infrastructure as Code, many ask, where do I begin? With all the are various choices when it comes to choosing the right tool for the job, which one is best?

Let’s first begin with defining what is Infrastructure as Code. Infrastructure as Code (IaC) is the management of infrastructure (networks, virtual machines, load balancers, and connection topology) in a descriptive model, using version control to store the files. You can also watch this awesome one minute video from the great Abel Wang, What is Infrastructure as Code?

There are a huge number of benefits to using IaC, to name just a few:

Your infrastructure can be stored in a source code repository (GitHub, Azure Repos, etc) adding in governance, versioning and increasing collaboration.
Infrastructure becomes reproducible and you can introduce life cycling into your deployments (implementing CI/CD – Continuous Integration/Continuous Deployment)
Scalability
Removes human error
Increase speed and consistency of your infrastructure deployments, lowering infrastructure administration costs.
Increasing productivity of your teams

Declarative vs Imperative Methods

When writing you infrastructure as code it is important to understand the difference between these two methods so that you understand the difference in the types of templates that can be written and the way in which you will write them.

Declarative languages define the desired state of the target, the system executes what needs to happen to achieve the desired state. Effectively you define the end state of the infrastructure, adding the resources that you need, along with their configuration and the IaC tool will figure the rest out.

Imperative languages define the specific commands that must executed and in the specific order the commands must run to achieve the desired state.

A declarative example would be: ‘Can I have a cup of coffee on my desk after lunch?’

Whereas an imperative example would be: ‘Go to the coffee machine, add 1 scoop of freshly ground beans and 400ml of water into the correct reservoir, press the start button, allow the coffee to fill the cup. Add in 50ml of fresh 2% milk to the cup and then deliver to my desk at precisely 1pm...’ You get the idea.

An imperative language requires more specific input and can fail during the process if one of the steps is not fulfilled properly for any reason.

A declarative style is great when you need to update your infrastructure or make any changes to it. Whereas the imperative is good for a deploy and forget model, but that isn’t always great if you’re looking to be an agile organization or have a changing infrastructure. The choice really comes down to personal preference and which situation fits best for your team.

IaC Tooling: So many Choices!

There are numerous tools that can be used for IaC, there are some questions that I would ask yourself and your team:

What skillsets are already present in the team around specific languages (i.e., C#, Golang, JSON, Typescript or none of the above – also a valid answer)?
What platform are we deploying onto (on-prem, Azure, AWS, etc)?

Does an imperative or declarative language make sense?
Are you looking to provision and manage configuration? Or just provision infrastructure?

I’ve listed some of the tools below, I’ll go through each one and describe some pros and cons, hopefully leading you to pick the one that suits you and your team the best.

Azure Resource Manager (ARM) Templates

ARM Templates are designed specifically for deployments into Microsoft Azure. If you are looking for a tool for on-premises environments or multiple cloud providers, this isn’t it. ARM is the native IaC templating option for Azure. You can deploy a resource in Azure using the Azure Portal, then download your template so that you can do it again and repeat the process. That is an easier way to get started, but there are some drawbacks.

First, you need to learn JSON, which could be your first hurdle. Also, when you export an ARM template there is quite a bit of boilerplate code that you need. ARM, for many people, can be difficult to learn. There is not a way to really know if what you’re deploying is what will get deployed (there isn’t a ‘what-if’ usage or ‘plan’ output that shows you what is about to be deployed). ARM has other limitations when it comes to writing IaC, such as when you get a validation or syntax error, it can be painful to troubleshoot with ARM. ARM templates can also grow to be very large and sometimes unwieldly. In an environment that needs repeatability and scalability, it can cause some issues.

On the other hand, there are some great learning resources for ARM templates if that is the path you choose:

ARM QuickStart Templates

Microsoft Learn – Create and deploy ARM templates

Pros:

Easy to export a working template from something that has been deployed from the Azure Portal
Fantastic (free) learning resources and QuickStart templates
Azure native so it supports all Azure services from day 0

Cons:

Learning curve around JSON
Templates can get long and unruly
Doesn’t manage state of your infrastructure, changes can be breaking

Bicep

Bicep is the Domain Specific Language (DSL) that allows for declarative deployment of Azure resources, so yes, this is an IaC tool that is native to Azure. Anything that you can do with an ARM template you can do with Bicep (and more!). As soon as a new resource is added into Azure, it is immediately supported by Bicep. Bicep requires a lot less syntax than ARM templates, you can compare the template syntax differences here.

Bicep vs ARM Templates

Bicep allows for the use of modules, which means you create a module for each grouping of resources, creating much more manageable and readable files. It keeps your IaC from getting too big and unruly. Bicep is integrated into the Azure CLI, making the Azure deployment experience really seamless.

One of my favorite features of Bicep is the ‘What-if’ operation. When you pass the argument, it checks your current deployment and what changes would be applied before you make them, allowing you to confirm those changes before it applies them. Knowing what you’re about to deploy before you push the button to deploy it is a great way to validate and ensure your results without having to deploy it first.

Pros:

Syntax improvements, much simpler than ARM for writing templates
Modules – allows you to create more complex templates much more easily
Resource dependency management is better managed with Bicep, it will automatically detect resource dependencies.

Cons:

Used only with deployments to Azure

Limitations still exist in its capability compared to other tools

Great learning resources with Bicep:

Get started with Bicep

Write your first Bicep Module with Microsoft Learn (and other free learning paths around Bicep)

Barbara Forbes' Blog for Bicep Learnings

Terraform:

Terraform is an open-source tool that uses HCL (Hashicorp Configuration Language), which is based on Golang, which many people find one of the most easily learned IaC languages. Terraform comes with a lot of benefits that makes it a popular choice.

Terraform can be used with any cloud and on-prem resources. While it requires a different template, you can use the same language and formatting to deliver IaC to any environment. The reality is most organization are multi-cloud and configured in a hybrid model, this is where Terraform shines.

terraform {
  required_version = ">=0.12"
  
  required_providers {
    azurerm = {
      source = "hashicorp/azurerm"
      version = "~>2.0"
    }
  }
}

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "vmss" {
 name     = var.resource_group_name
 location = var.location
 tags     = var.tags
}

resource "random_string" "fqdn" {
 length  = 6
 special = false
 upper   = false
 number  = false
}

resource "azurerm_virtual_network" "vmss" {
 name                = "vmss-vnet"
 address_space       = ["10.0.0.0/16"]
 location            = var.location
 resource_group_name = azurerm_resource_group.vmss.name
 tags                = var.tags
}

resource "azurerm_subnet" "vmss" {
 name                 = "vmss-subnet"
 resource_group_name  = azurerm_resource_group.vmss.name
 virtual_network_name = azurerm_virtual_network.vmss.name
 address_prefixes       = ["10.0.2.0/24"]
}

resource "azurerm_public_ip" "vmss" {
 name                         = "vmss-public-ip"
 location                     = var.location
 resource_group_name          = azurerm_resource_group.vmss.name
 allocation_method            = "Static"
 domain_name_label            = random_string.fqdn.result
 tags                         = var.tags
}

resource "azurerm_lb" "vmss" {
 name                = "vmss-lb"
 location            = var.location
 resource_group_name = azurerm_resource_group.vmss.name

 frontend_ip_configuration {
   name                 = "PublicIPAddress"
   public_ip_address_id = azurerm_public_ip.vmss.id
 }

 tags = var.tags
}

resource "azurerm_lb_backend_address_pool" "bpepool" {
 loadbalancer_id     = azurerm_lb.vmss.id
 name                = "BackEndAddressPool"
}

resource "azurerm_lb_probe" "vmss" {
 resource_group_name = azurerm_resource_group.vmss.name
 loadbalancer_id     = azurerm_lb.vmss.id
 name                = "ssh-running-probe"
 port                = var.application_port
}

resource "azurerm_lb_rule" "lbnatrule" {
   resource_group_name            = azurerm_resource_group.vmss.name
   loadbalancer_id                = azurerm_lb.vmss.id
   name                           = "http"
   protocol                       = "Tcp"
   frontend_port                  = var.application_port
   backend_port                   = var.application_port
   backend_address_pool_id        = azurerm_lb_backend_address_pool.bpepool.id
   frontend_ip_configuration_name = "PublicIPAddress"
   probe_id                       = azurerm_lb_probe.vmss.id
}

resource "azurerm_virtual_machine_scale_set" "vmss" {
 name                = "vmscaleset"
 location            = var.location
 resource_group_name = azurerm_resource_group.vmss.name
 upgrade_policy_mode = "Manual"

 sku {
   name     = "Standard_DS1_v2"
   tier     = "Standard"
   capacity = 2
 }

 storage_profile_image_reference {
   publisher = "Canonical"
   offer     = "UbuntuServer"
   sku       = "16.04-LTS"
   version   = "latest"
 }

 storage_profile_os_disk {
   name              = ""
   caching           = "ReadWrite"
   create_option     = "FromImage"
   managed_disk_type = "Standard_LRS"
 }

 storage_profile_data_disk {
   lun          = 0
   caching        = "ReadWrite"
   create_option  = "Empty"
   disk_size_gb   = 10
 }

 os_profile {
   computer_name_prefix = "vmlab"
   admin_username       = var.admin_user
   admin_password       = var.admin_password
   custom_data          = file("web.conf")
 }

 os_profile_linux_config {
   disable_password_authentication = false
 }

 network_profile {
   name    = "terraformnetworkprofile"
   primary = true

   ip_configuration {
     name                                   = "IPConfiguration"
     subnet_id                              = azurerm_subnet.vmss.id
     load_balancer_backend_address_pool_ids = [azurerm_lb_backend_address_pool.bpepool.id]
     primary = true
   }
 }

 tags = var.tags
}

resource "azurerm_public_ip" "jumpbox" {
 name                         = "jumpbox-public-ip"
 location                     = var.location
 resource_group_name          = azurerm_resource_group.vmss.name
 allocation_method            = "Static"
 domain_name_label            = "${random_string.fqdn.result}-ssh"
 tags                         = var.tags
}

resource "azurerm_network_interface" "jumpbox" {
 name                = "jumpbox-nic"
 location            = var.location
 resource_group_name = azurerm_resource_group.vmss.name

 ip_configuration {
   name                          = "IPConfiguration"
   subnet_id                     = azurerm_subnet.vmss.id
   private_ip_address_allocation = "dynamic"
   public_ip_address_id          = azurerm_public_ip.jumpbox.id
 }

 tags = var.tags
}

resource "azurerm_virtual_machine" "jumpbox" {
 name                  = "jumpbox"
 location              = var.location
 resource_group_name   = azurerm_resource_group.vmss.name
 network_interface_ids = [azurerm_network_interface.jumpbox.id]
 vm_size               = "Standard_DS1_v2"

 storage_image_reference {
   publisher = "Canonical"
   offer     = "UbuntuServer"
   sku       = "16.04-LTS"
   version   = "latest"
 }

 storage_os_disk {
   name              = "jumpbox-osdisk"
   caching           = "ReadWrite"
   create_option     = "FromImage"
   managed_disk_type = "Standard_LRS"
 }

 os_profile {
   computer_name  = "jumpbox"
   admin_username = var.admin_user
   admin_password = var.admin_password
 }

 os_profile_linux_config {
   disable_password_authentication = false
 }

 tags = var.tags
}

Terraform builds resources, makes changes and can call existing resources using a state file. Terraform is easily readable and uses modules to easily configure your code and call your resources. While Terraform is a declarative language, it does call the state file to know what it is supposed to deploy. Managing the state file does introduce other topics (security, access, etc), but is very much achieved using the documentation in place. Learn more about Terraform state files here.

Terraform has great features built in to validate your code, run a ‘plan’ so you know exactly what elements are going to change before they change, and traceability of what was deployed. Terraform shines when you want to continuously deploy your infrastructure, it even has the ability to deploy to different environments using workspaces.

Pros:

Multi-cloud capability
Easy to write and understand syntax, while also easy to setup and deploy
Built in features to show what is deploying before it is deployed, as well as validation and formatting.

Cons:

New services in Azure aren’t always available to deploy using Terraform
Declarative languages require the use of dependency mapping when deploying (example: deploying a VM without networking first, will error out)

Terraform on Azure Video

Terraform on Azure Blog - covering the basics into modules and state files

Generate your first Terraform template with NubesGen

Terraform on Azure Documentation

Pulumi:

Pulumi is another IaC tool that uses a declarative format to deploy your infrastructure, the biggest differentiator with Pulumi is that it allows you to write your IaC in the language that your organization or team knows best. Pulumi support TypeScript, JavaScript, Python, Go and C#, which means that you write your templates in the language that you are comfortable with.

Adding in another bonus, you can use the testing tools native to that language to test your code. Testing is crucial. We not only want to deploy our infrastructure as code to automate tasks and increase our velocity, but we also need to reduce our human error. This is where testing is a crucial part of the development and deployment lifecycle.

Pulumi, like Terraform supports ANY cloud. It has another huge benefit: It can coexist or convert your existing templates from Terraform, ARM, Helm/YAML, etc into Pulumi.

Pros:

Pulumi allows for easy adoption with a more familiar language and allows the conversion of existing templates

Allows for IaC adoption in a language that works for your team

Cons:

If you don’t have ANY experience in any languages, you will need to choose one and skill up.

Video on deploying to Azure using Pulumi

Ansible

Ansible an imperative IaC tool, while it not only provisions your infrastructure, but it also manages the configuration of your services. The other services above do not, another 3rd party tool would be required. Ansible relies heavily on YAML files to define your infrastructure in the form of Ansible Playbooks and Python for its written language. These describe your automation tasks form deployment to ongoing state, it’s an all-in-one solution.

Ansible does not maintain state, it does not keep track of dependencies. Ansible is fairly easy to get started with but does have less of a community feel when looking for troubleshooting tips or self-help.

Pros:

Simple to learn as it’s written in an easily understood Python language, while the Playbooks are written in YAML
Ansible is agentless, decreasing maintenance and performance degradations

Cons:

Lack of state meaning it doesn’t track dependencies. It will execute tasks sequentially, stopping when the task finishes, fails or encounters an error.
Lack of enterprise support and community feel for troubleshooting.

Chef:

Chef is an open source IaC tool that can run on multiple platforms (Windows, Linux, AWS, Azure, etc) and uses cookbooks and recipes to define not only your deployment templates, but also your configuration of your environment. Chef uses Ruby DSL, requiring a dedicated set of programming skills to learn the language. Chef requires an infrastructure to run on, so that is a consideration when looking at it, there is a licensing and infrastructure cost associated to this. This also means that Chef runs on a dedicated environment, requiring an agent on every machine that you are deploying to.

Due to the fact that Chef requires a lot of other considerations outside of just the capability of the product I am going to list the pros and cons, it very much requires much more consideration outside of just infrastructure as code.

Pros:

Scalable, easily handles a large infrastructure

Extensive collections of configuration and module recipes

Cons:

Requirement to learn Ruby, be ready for a steep learning curve
Complexity and overhead management, difficult to install

Puppet

Puppet and Chef often get roped together when comparing IaC as they’ve both been around for some time. Puppet uses its own declarative language to deploy and maintain system configuration, it uses manifests and modules in the form of PuppetDSL.

Puppet also requires an infrastructure to run on, deploying agents on every machine that you are deploying and managing. As Puppet also requires a lot of other considerations outside of just the capability of the product, it’s not one that is as popular in Azure when there are more cost-effective options.

Pros:

Scalable, easily handles a large infrastructure
Well-established support community
Powerful reporting capabilities

Cons:

Requirement to learn PuppetDSL
Complexity and overhead management, difficult to install

In Summary

Choosing an Infrastructure as Code tool is decision that requires thought, along with comparing the pros and cons for every organization. There is no one-size-fits-all solution for anyone nor any company. Take your time, read through the options and find the best solution for you. Once you choose your preferred IaC tool, make sure you start looking at how to automate not only your infrastructure, but also your delivery process with a solid continuous integration/continuous delivery (CI/CD) tool.

Happy coding!

Posted at https://sl.advdat.com/3vc1hKuhttps://sl.advdat.com/3vc1hKu

Thursday, February 24, 2022

Infrastructure as Code (IaC): Comparing the Tools