Continuous integration and continuous delivery (CI/CD) culture started to get popular, and it brought the challenge of having everything automatized, aiming to make processes easier and maintainable for everyone.
One of the most valuable aspects of CI/CD is the integration of the Infrastructure as Code (IaC) concept, with IaC we can version our infrastructure, save money, creating new environments in minutes, among many more benefits. I won't go deeper about IaC, but if you want to learn further visit: The benefits of Infrastructure as Code
IaC can also bring some challenges when creating resources needed for the projects. This is mostly due to creating all the scripts for the infrastructure is a task that is usually assigned to the infrastructure engineers, and it happens that we can't have the opportunity to be helped for any reason.
As a Data Engineer, I would like to help you understand the CI/CD process with a hands-on. You'll learn how to create Azure Databricks through Terraform and Azure DevOps, whether you are creating projects by yourself or supporting your Infrastructure Team.
In this article, you´ll learn how to integrate Azure Databricks with Terraform and Azure DevOps and the main reason is just because in this moment I've had some difficulties getting the information with these 3 technologies together.
First of all, you'll need some prerequisites
- Azure Subscription
- Azure Resource Group (you can use an existing one)
- Azure DevOps account
- Azure Storage Account with a container named "tfstate"
- Visual Studio Code (it's up to you)
So, let's start and have some fun
Please, go ahead and download or clone this GitHub repository databrick-tf-ado and get demo-start branch.
In the folder you'll see a file named main.tf and 2 more files in the folder modules/databricks-workspace
It should be noted that this example is a basic one, so you can find more information of all the features for databricks in this link: https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs
Now, go to the main.tf file in the root folder and find line 8 where the declaration of azurerm starts
backend "azurerm" {
resource_group_name = "demodb-rg"
storage_account_name = "demodbtfstate"
container_name = "tfstate"
key = "dev.terraform.tfstate"
}
there you need to change the value of resource_group_name and storage_account_name for the values of you subscription, you can find those values in your Azure Portal, they need to be already created.
In main.tf file inside root folder there's a reference to a module called "databricks-workspace", now in that folder you can see 2 more files main.tf and variables.tf.
main.tf contains the definition to create a databricks workspace, a cluster, a scope, a secret and a notebook, in the format that terraform requires and variables.tf contains the information of the values that could change depending on the environment.
Now that you changed the values mentioned above into a GitHub or DevOps repository if you need assistance for that visit these pages: GitHub or DevOps.
At this moment we have our github or devops repository with the names that we require configured, so let´s create our pipeline to deploy our databricks environment into our Azure subscription.
First go to your azure subscription and check that you don't have a databricks called demodb-workspace
You'll need to install an extension so DevOps can use terraform commands so go to Terraform Extension.
Once is installed in your project in Azure DevOps click on Pipelines-Release and Create "new pipeline", it appears the option by creating the pipeline with YAML or with the Editor, I'll choose the Editor so we can see it clearer.
In Add an Artifact in the Artifact section of the pipeline select your source type (provider where you uploaded your repository) and fill all the required information, like the image below and click "Add"
Then click on Add stage in Stages section and choose empty Job and name the stage as "DEV"
After that click on Jobs below the name of the stage
In the Agent job, press the "+" button and search for "terraform" select "Terraform tool installer"
Leave the default information
Then Add another 3 tasks of "Terraform" task
Name the second task after Installer as "Init" and fill the information required like the image:
For all these 3 tasks set the information of your subscription, resource group, storage account and container, and there's also a value labeled key, there you have to set "dev.terraform.tfstate" is a key that terraform uses to keep tracking of your Infrastructure changes.
Name next task as "Plan"
Next task "Apply"
Now change the name of your pipeline and save it
And we only need to create a Release to test it
You can monitor the progress
When it finished, if everything was good you'll see your pipeline as successful
Lastly let´s confirm in the azure portal that everything is created correctly
then login in your workspace and check the and run the notebook, so you can test that the cluster, the scope, the secret and the notebook are working correctly.
With that you can easily maintain your environments safe from the changes that contributors can do, only one way to accept modifications into your infrastructure.
Let us know any comments or questions.
Posted at https://sl.advdat.com/3s6hkrbhttps://sl.advdat.com/3s6hkrb