Introduction:

In this article we are going to use YAML pipelines for doing the deployment of synapse code along with customization of input parameters which can help you create the deployment dynamic.

We will also display how to start and stop the triggers which will play important role in your synapse deployment.

Terminology related to Azure Synapse Analytics:

A Synapse workspace is a securable collaboration boundary for doing cloud-based enterprise analytics in Azure. A workspace is deployed in a specific region and has an associated ADLS Gen2 account and file system (for storing temporary data). A workspace is under a resource group.

A workspace allows you to perform analytics with SQL and Apache spark. Resources available for SQL and Spark analytics are organized into SQL and Spark pools.

Linked services

A workspace can contain any number of Linked service, essentially connection strings that define the connection information needed for the workspace to connect to external resources.

Synapse SQL

Synapse SQL is the ability to do T-SQL based analytics in Synapse workspace. Synapse SQL has two consumption models: dedicated and server less. For the dedicated model, use dedicated SQL pools. A workspace can have any number of these pools. To use the server less model, use the server less SQL pools. Every workspace has one of these pools. Inside Synapse Studio, you can work with SQL pools by running SQL scripts.

Apache Spark for Synapse

To use Spark analytics, create and use server less Apache Spark pools in your Synapse workspace. When you start using a Spark pool, the workspaces creates a spark session to handle the resources associated with that session.

Pipelines

Pipelines are how Azure Synapse provides Data Integration - allowing you to move data between services and orchestrate activities.

Source: Please refer this link to know more https://docs.microsoft.com/en-us/azure/synapse-analytics/overview-terminology

Git Integration in Synapse Workspace (Continuous Integration):

For this section you can refer to an existing tech blog from here.

Pre-requisites before Release to higher environments:

1.Make sure you have the 'Synapse Workspace Deployment' extension installed from visual studio marketplace in the organizational settings.

2.Make sure appropriate permissions are given to service connection (used for Azure DevOps Deployment Pipelines) in the Synapse Workspace as Synapse Administrator. (Refer below screenshot)

Stopping and Starting Pipeline Triggers before and after the Deployment:

Similar to Azure Data Factory, you need to stop the pipeline triggers before deploying synapse workspace artifact deployment and starting the triggers after the deployment is complete.

Note: We will be needing to install "Az.Synapse" Module in the agent because the Power-shell module is still in preview .
The code will get list of triggers from the synapse workspace and stop/start each trigger using the for loop (looping through the list of triggers).
The following PowerShell code is used to start and stop triggers:

#starting the triggers
      - task: AzurePowerShell@5
        displayName: Stop Triggers
        inputs:
        azureSubscription: '$(azureSubscription)'
        ScriptType: 'InlineScript'
        Inline: |
          Install-Module -Name "Az.Synapse" -Confirm:$false  -Scope CurrentUser  -Force;
          $triggersSynapse = Get-AzSynapseTrigger -WorkspaceName "$(DeployWorkspaceName)" ; 
          $triggersSynapse | ForEach-Object { Stop-AzSynapseTrigger -WorkspaceName "$(DeployWorkspaceName)" -Name $_.name }
          azurePowerShellVersion: 'LatestVersion'

#stopping the triggers

      - task: AzurePowerShell@5
        displayName: Restart Triggers
        inputs:
          azureSubscription: '$(azureSubscription)'
          ScriptType: 'InlineScript'
          Inline: '$triggersSynapse = Get-AzSynapseTrigger -WorkspaceName "$(DeployWorkspaceName)" ; $triggersSynapse | ForEach-Object { Start-AzSynapseTrigger -WorkspaceName "$(DeployWorkspaceName)" -Name $_.name }'
          azurePowerShellVersion: 'LatestVersion'

Template_parameters.json file to create custom parameters of the workspace template:

You need to parameterize certain values in the ARM template to override the values with the higher environment(UAT,QA,Prod) names and connection strings.
In order to do this, you need to override the default parameter template.
To override the default parameter template, you need to create a custom parameter template, a file named template-parameters-definition.json in the root folder of our git collaboration branch. You must use that exact file name.

Custom parameter syntax(Quoted from Microsoft Documentation):

Source : https://docs.microsoft.com/en-us/azure/synapse-analytics/cicd/continuous-integration-deployment#:~:text=In an Azure Synapse Analytics workspace%2C continuous integration,workspace to another workspace%2C there are two parts .

The following are some guidelines for creating the custom parameters file:

Enter the property path under the relevant entity type.
Setting a property name to * indicates that you want to parameterize all properties under it (only down to the first level, not recursively). You can also provide exceptions to this configuration.
Setting the value of a property as a string indicates that you want to parameterize the property. Use the format <action>:<name>:<stype>.
<action> can be one of these characters:
"=" means keep the current value as the default value for the parameter.
"-" means don't keep the default value for the parameter.
"|" is a special case for secrets from Azure Key Vault for connection strings or keys.
<name> is the name of the parameter. If it's blank, it takes the name of the property. If the value starts with a - character, the name is shortened. For example, AzureStorage1_properties_typeProperties_connectionString would be shortened to AzureStorage1_connectionString.
<stype> is the type of parameter. If <stype> is blank, the default type is string. Supported values: string, securestring, int, bool, object, secureobject and array.
Specifying an array in the file indicates that the matching property in the template is an array. Synapse iterates through all the objects in the array by using the definition that's specified. The second object, a string, becomes the name of the property, which is used as the name for the parameter for each iteration.
A definition can't be specific to a resource instance. Any definition applies to all resources of that type.
By default, all secure strings, like Key Vault secrets, and secure strings, like connection strings, keys, and tokens, are parameterized.
Sample Template_parameters.json in Microsoft documentation:

{
"Microsoft.Synapse/workspaces/notebooks": {
        "properties":{
            "bigDataPool":{
                "referenceName": "="
            }
        }
    },
    "Microsoft.Synapse/workspaces/sqlscripts": {
	 "properties": {
         "content":{
             "currentConnection":{
                    "*":"-"
                 }
            } 
        }
	},
    "Microsoft.Synapse/workspaces/pipelines": {
        "properties": {
            "activities": [{
                 "typeProperties": {
                    "waitTimeInSeconds": "-::int",
                    "headers": "=::object"
                }
            }]
        }
    },
    "Microsoft.Synapse/workspaces/integrationRuntimes": {
        "properties": {
            "typeProperties": {
                "*": "="
            }
        }
    },
    "Microsoft.Synapse/workspaces/triggers": {
        "properties": {
            "typeProperties": {
                "recurrence": {
                    "*": "=",
                    "interval": "=:triggerSuffix:int",
                    "frequency": "=:-freq"
                },
                "maxConcurrency": "="
            }
        }
    },
    "Microsoft.Synapse/workspaces/linkedServices": {
        "*": {
            "properties": {
                "typeProperties": {
                     "*": "="
                }
            }
        },
        "AzureDataLakeStore": {
            "properties": {
                "typeProperties": {
                    "dataLakeStoreUri": "="
                }
            }
        }
    },
    "Microsoft.Synapse/workspaces/datasets": {
        "properties": {
            "typeProperties": {
                "*": "="
            }
        }
    }
}

Here's an explanation of how the preceding template is constructed, broken down by resource type.

Notebooks
Any property in the path properties/bigDataPool/referenceName is parameterized with its default value. You can parameterize attached Spark pool for each notebook file.

SQL Scripts
Properties (poolName and databaseName) in the path properties/content/currentConnection are parameterized as strings without the default values in the template.

Pipelines
Any property in the path activities/typeProperties/waitTimeInSeconds is parameterized. Any activity in a pipeline that has a code-level property named waitTimeInSeconds (for example, the Wait activity) is parameterized as a number, with a default name. But it won't have a default value in the Resource Manager template. It will be a mandatory input during the Resource Manager deployment.
Similarly, a property called headers (for example, in a Web activity) is parameterized with type object (Object). It has a default value, which is the same value as that of the source factory.

IntegrationRuntimes
All properties under the path typeProperties are parameterized with their respective default values. For example, there are two properties under IntegrationRuntimes type properties: computeProperties and ssisProperties. Both property types are created with their respective default values and types (Object).
Triggers
Under typeProperties, two properties are parameterized. The first one is maxConcurrency, which is specified to have a default value and is of typestring. It has the default parameter name <entityName>_properties_typeProperties_maxConcurrency.
The recurrence property also is parameterized. Under it, all properties at that level are specified to be parameterized as strings, with default values and parameter names. An exception is the interval property, which is parameterized as type int. The parameter name is suffixed with <entityName>_properties_typeProperties_recurrence_triggerSuffix. Similarly, the freq property is a string and is parameterized as a string. However, the freq property is parameterized without a default value. The name is shortened and suffixed. For example, <entityName>_freq.

LinkedServices
Linked services are unique. Because linked services and datasets have a wide range of types, you can provide type-specific customization. In this example, for all linked services of type AzureDataLakeStore, a specific template will be applied. For all others (via *), a different template will be applied.
The connectionString property will be parameterized as a securestring value. It won't have a default value. It will have a shortened parameter name that's suffixed with connectionString.
The property secretAccessKey happens to be an AzureKeyVaultSecret (for example, in an Amazon S3 linked service). It's automatically parameterized as an Azure Key Vault secret and fetched from the configured key vault. You can also parameterize the key vault itself.

Datasets
Although type-specific customization is available for datasets, you can provide configuration without explicitly having a *-level configuration. In the preceding example, all dataset properties under typeProperties are parameterized.

Continous Deployment(Release Pipeline) for Synapse Workspace Deployment:

We will be deploying Sql Scripts(Related to Dedicated SQL Pool and ServerLess), Spark Notebooks, ETL/ELT Pipelines, Pipeline Triggers, Linked Services , Datasets etc using the ARM template which will be generated after publishing the workspace artifacts from main branch to the 'workspace_publish' branch where a ARM template will be generated.
The following is the yaml template which uses Synapse workspace deployment task to deploy the workspace artifacts using ARM template in the workspace_publish branch.
We should store connection strings in the keyvault and then override those parameters for the linked services in the override parameters section(Synapse Workspace Deployment Task) and make sure the environment in the Synapse Workspace Deployment is selected as Azure Public.(Refer screenshot below)

Another instruction while overriding the parameters in synapse deployment task below:
They should be overridden in the format

-<parameter-overridden> : <value-to-be-overridden>
-<parameter-overridden> : <value-to-be-overridden>

CD YAML Code

name: Release-$(rev:r)
trigger:
  branches:
    include:
    - workspace_publish
  paths:
    include:
    - '<target workspace name>/*'
resources:
  repositories:
  - repository: <repo name>
    type: git
    name: <repo name>
    ref: workspace_publish
variables:
- name: azureSubscription
  value: '<name of service connection>'
- name: vmImageName
  value: 'windows-2019'
- name: KeyVaultName
  value: '<Kv Name>'
- name: SourceWorkspaceName
  value: '<Source workspace name>'
- name: DeployWorkspaceName
  value: '<Deployment Workspace name>'
- name: DeploymentResourceGroupName
  value: '<Deployment Workspace Name>'
stages:
- stage: Release
  displayName: Release stage
  jobs:
  - job: Release
    displayName: Release job
    pool:
      vmImage: $(vmImageName)
    steps:
    - task: AzureKeyVault@1
      inputs:
        azureSubscription: '$(azureSubscription)'
        KeyVaultName: $(KeyVaultName)
        SecretsFilter: '*'
    - task: AzurePowerShell@5
      displayName: Stop Triggers
      inputs:
        azureSubscription: '$(azureSubscription)'
        ScriptType: 'InlineScript'
        Inline: "Install-Module -Name \"Az.Synapse\" -Confirm:$false  -Scope CurrentUser  -Force;\n$triggersSynapse = Get-AzSynapseTrigger -WorkspaceName \"$(DeployWorkspaceName)\" ; \n$triggersSynapse | ForEach-Object { Stop-AzSynapseTrigger -WorkspaceName \"$(DeployWorkspaceName)\" -Name $_.name }\n"
        azurePowerShellVersion: 'LatestVersion'
    - task: Synapse workspace deployment@1
      inputs:
        TemplateFile: '$(System.DefaultWorkingDirectory)/$(SourceWorkspaceName)/TemplateForWorkspace.json'
        ParametersFile: '$(System.DefaultWorkingDirectory)/$(SourceWorkspaceName)/TemplateParametersForWorkspace.json'
        azureSubscription: '$(azureSubscription)'
        ResourceGroupName: '$(DeploymentResourceGroupName)'
        TargetWorkspaceName: '$(DeployWorkspaceName)'
        DeleteArtifactsNotInTemplate: true
        OverrideArmParameters: |
          workspaceName: $(DeployWorkspaceName)
          #<parameter-overridden> : <value-to-be-overridden> there are parameters in arm template
          #<parameter-overridden> : <value-to-be-overridden>
        Environment: 'prod'
    - task: AzurePowerShell@5
      displayName: Restart Triggers
      inputs:
        azureSubscription: '$(azureSubscription)'
        ScriptType: 'InlineScript'
        Inline: '$triggersSynapse = Get-AzSynapseTrigger -WorkspaceName "$(DeployWorkspaceName)" ; $triggersSynapse | ForEach-Object { Start-AzSynapseTrigger -WorkspaceName "$(DeployWorkspaceName)" -Name $_.name }'
        azurePowerShellVersion: 'LatestVersion'

Posted at https://sl.advdat.com/3GaIzFS

Advanced Data Solutions

Friday, January 7, 2022

Azure Synapse Studio CICD using YAML pipelines

Introduction:

Terminology related to Azure Synapse Analytics:

Template_parameters.json file to create custom parameters of the workspace template:

Custom parameter syntax(Quoted from Microsoft Documentation):

Continous Deployment(Release Pipeline) for Synapse Workspace Deployment: