Tuesday, July 27, 2021

Software Defined Monitoring - Using Automated Notebooks and Azure Sentinel to Improve Sec Ops

Incident triage is a core component of security monitoring operations and ensuring triage processes are efficient and effective is key to detecting security threats. Recent high profile security incidents have shown that detecting threats is insufficient unless effective triage and investigation of them is conducted. In this blog we detail how to deploy and use a solution that allows for the automatic execution of Jupyter Notebooks to provide enrichment to incidents within Azure Sentinel.  This process allows security analysts to triage incidents more quickly and effectively, as well as ensuring a consistent, quality approach is taken.

 

Background

The objective of this solution is to reduce the time and effort required for a Security Operations Center (SOC) analyst to triage an incident within Azure Sentinel and help ensure a consistent approach is taken to each incident. This is done by automatically executing Jupyter notebooks that perform a set of pre-defined actions on the incident like those conducted by an analyst when triaging an incident. It provides three main benefits:

  • It makes the triaging of incidents more efficient by running some logic against the incidents and adjusting incident severity based on the output of the enrichment run in the notebook.
  • It makes the results of the enrichment easily viewable by the SOC analyst by adding a link to the executed notebook to the incident log. This saves the analysts valuable time when triaging the incident.
  • By executing these steps from a template pattern, it provides a consistent triage approach for all incidents, helping to ensure quality and reduce the chance of a security incident being missed.

We refer to this approach of using notebook patterns to define and execute these processes as Software Defined Monitoring. To learn more about this approach please watch this recent webinar we presented on the subject

 

 

Contents

  1. Summary of the solution.
  2. Architecture overview.
  3. Deploying the solution.
    1. Collect required variables.
    2. Add variables to the notebook.
    3. Deploying Papermill infrastructure.
      1.       Deploying with ARM.
      2.       Manual deployment.
    4. Deploying KeyVault Access.
    5. Installing required packages.
    6. MSTICPy Config
    7. Optional - Using Azure Storage Queues to manage which incidents are triaged.
  4. Using the notebooks during incident triage.
  5. Troubleshooting.

 

Summary of the solution

This document covers the end-to-end process to deploying this solution within an Azure subscription including all the requisite components for the automated notebook elements. Along with this document are two separate Jupyter Notebooks and an ARM template for deploying the required VM. The first notebook is the ‘AutomatedNotebooks-Manager.ipynb’ and the other is ‘AutomatedNotebooks-IncidentTriage.ipynb’’. These notebooks, along with the ARM template and a Python requirements firl can be downloaded from GitHub:

 

In addition, these resources the following are pre-requisites:

  • An Azure subscription, with permissions to deploy resources within it.
  • An Azure Virtual Machine (VM) to run the automated notebooks.
  • An Azure Sentinel workspace.
  • An Azure Key Vault, with the ability to read and write secrets to it.
  • An Azure Machine Learning (ML) workspace.

Architecture overview

Architecture.png

 

The core of the solution is an Azure VM that runs several Jupyter notebooks. The Manager notebook programmatically gets details of incidents from Azure Sentinel, if these match a set of criteria it then runs another notebook that performs triage and enrichment based on the entities attached to that incident. That completed triage notebook is then written to an Azure ML workspace, with a link to the notebook added as a comment to the incident in Azure Sentinel. From there the SOC analyst can follow the link to view and interact with the completed triage notebook. In addition, depending on the findings in the notebook the severity of the Incident is updated in Azure Sentinel.

 

Deploying the solution

Collect required variables.

To deploy the solution, some configurable variables are first required:

  1. The Azure Tenant ID where the resources being used are. More details.

  2. The Subscription ID where your Azure Sentinel Workspace is deployed.

  1. This can be found in the Azure Portal > Azure Sentinel > Settings > Workspace settings > Overview

1.png  3. The Resource Group name where your Azure Sentinel Workspace is deployed.

  1. This can be found in the Azure Portal > Azure Sentinel > Settings > Workspace settings > Overview

2.png

  4. The Workspace Name of your Azure Sentinel Workspace.

  1. This can be found in the Azure Portal > Azure Sentinel > Settings > Workspace settings > Overview

3.png

  5. The Workspace ID of your Azure Sentinel Workspace.

  1. This can be found in the Azure Portal > Azure Sentinel > Settings > Workspace settings > Overview

4.png

  6. The Subscription ID where you Azure ML Workspace is deployed.

  1. This can be found in the Azure portal > under the Azure ML resource > Overview

5.png

  7. The Resource Group name where your Azure ML workspace is deployed.

  1. This can be found in the Azure portal > under the Azure ML resource > Overview

6.png

  8. The Azure ML Workspace name.

  1. This can be found in the Azure portal > under the Azure ML resource > Overview

Note: in this example the workspace name is “AzureMLWorkspace”

7.png  9. The name of the Key Vault being used (see pre-requisites).

  1. This can be found in the Azure Portal > Key Vault > Overview

8.png

  10. Another variable required is an Access Key for the Storage Account used by your Azure ML Workspace. Due to the sensitivity of this key we will store it in KeyVault in order to keep it secure. To find the storage account associated with your Azure ML workspace, find the Azure ML resource in the Azure Portal and browse to the Overview tab. Listed here will be a storage resource:

storage.png

Clicking on that resource will open it in the Azure Portal. From there select the Access Keys and you will be presented with two access keys. Select the Key value from one of these (it doesn’t matter which one you use), and then add that as a Secret in your KeyVault. When adding the Access Key make a note of the name you give the secret as it will be needed later in the set up. (Do not paste the storage key into notebook).

 

Adding variables to the notebook

Once the above elements have been collected the “AutomatedNotebooks-Manager.ipynb” notebook needs to be updated to include these values. They are all set in a single cell near the top of the notebook, simply open the notebook[i] and replace the placeholder values with those collected above. The cell includes comments that detail where each value should go.

10.png

You will see sections in this cell for details of an Azure Storage Queue, these are optional setting which are covered later in this document (see the Queue Management section).

 

Deploying Papermill infrastructure

The technology used to run the automated notebooks is Papermill. A dedicated Azure IaaS VM will be deployed to run the Papermill tasks. For this documentation we will document how to deploy an Ubuntu Linux host; however, the solution could also be deployed on a Windows host.

 

Deploying via ARM

To make deployment easier we have created an ARM template that deploys the VM and configures some of the required identity elements. If you want to deploy using this method the ARM template can be downloaded from GitHub, and you can find instructions on how to deploy it here.

During deployment you will be asked to provide a number of parameters, these include:

  • The Resource Group you want to deploy the resources in.
  • An SSH key to use to access the VM.
  • The Subscription ID and Resource Group name where your Azure Sentinel workspace is (if different from the Resource Group that the VM is being deployed in).
  • The Subscription ID and Resource Group name where your AzureML workspace is (if different from the Resource Group that the VM is being deployed in).

When deploying the ARM template you will be asked to provide these variables on the following page:

11.png

If you deploy the VM via ARM you can skip ahead to the ‘Deploying KeyVault Access’ section.

 

Deploying Manually

When deploying Azure VMs we recommend that you follow best practice and secure access to the VM using Just In Time Access and Azure Defender.

Detailed instructions on deploying a Linux VM in Azure can be found here. We recommend that a SKU with at least 2 vCPUs and 4 GiB memory is used.

Once the VM is deployed its needs assigning a Managed Identity. This identity will be used by the Papermill process to access Azure Sentinel, as well as secrets stored in Azure Key Vault. This can be configured by browsing to the Azure VM created previously in the Azure Portal and selecting the Identity tab. From here select System Assigned and set the Status to On. Once enabled, you need to grant some required permissions to the VM managed identity. To do this, select Azure role assignments

12.png

The first permission is to access Azure Sentinel. The automated notebooks need to access incident details, query logs to gather context, and update incidents based on the output of their analysis. As such Azure Sentinel Responder role permissions are required. More details about this role can be found here. Currently, this role cannot be set directly on the Azure Sentinel workspace, so the role must be scoped at the Subscription or Resource Group level. We recommend that the Resource Group is used as it’s the lowest level of access available. Ensure that you select the Resource Group that contains the Azure Sentinel workspace you want to use.

13.png

Finally, the papermill process needs to be able enumerate resources associated with the Azure ML Workspace being used. This is needed to locate the file store used by the Azure ML workspace so that executed notebooks can be written there. Therefore, the VM Managed Identity needs the Reader role assigned for the Resource Group that contains your Azure ML workspace.

14.png

Deploying KeyVault Access

Regardless of how you deployed the VM you will need to manually configure an additional managed identity to access the Key Vault that we are using so that it can retrieve the secrets stored there. To do this open the VM you deployed in the Azure Portal and select the `Identity` section. From there select `Azure Role Assignments` and `Add Role Assignment`. As Key Vault is a specific resource available for Managed Identity role provision, you can select the specific Key Vault you are using for this solution[i] once you have selected `Key Vault` as the scope. The `Key Vault Secrets User` role is required, more details on this role can be found here.

15.png

More details about Managed Identities can be found here.

Note: your KeyVault needs to be configured for Role Based Access Control – more details can be found here: https://docs.microsoft.com/en-us/azure/key-vault/general/rbac-guide?tabs=azure-cli

 

Installing required packages

With the VM deployed and the correct permissions assigned via a Managed Identity the next step is to install Papermill and the other required packages on our VM host. These require Python 3 to be installed first, if you deployed a Linux Ubuntu VM this will be installed by default and packages can be installed immediately. If another OS was deployed, you may need to first install Python 3.

 

Optional Step: To manage package installs you can choose to use a solution like Conda or Python virtualenv to create a dedicated Python environment for papermill to operate in. If you don’t plan on running anything except automated notebooks on a host this isn’t essential but might make future management easier. If you choose to do this, follow the steps bellow within your virtual environment, and when scheduling the regular task ensure you first activate your chosen virtual environment.

 

The installation of packages will be done using pip; this first needs to be installed if not present. On an Ubuntu VM this can be done by running the command:

 

 

sudo apt install python3-pip

 

 

Note: you may need to first run `sudo apt update`

To install the required packages download the autonb-requirements.txt file from GitHub and install the packages detailed in that file using Pip with the following command:

 

 

python3 -m pip install -r autonb-requirements.txt

 

 

Once installed, you need to ensure you add the papermill package to your $PATH and restart the terminal to ensure its available via the CLI.

Once the packages are installed you will also need to configure a kernel to execute the notebooks with. This is done with ipykernel which you should have just installed with the above commands. You can create a new kernel called ‘papermill’ (this is the default kernel used the notebooks) with the following command:

 

 

python3 -m ipykernel install --user --name papermill

 

 

Once papermill and the required packages are installed and they kernel created, copy the two notebooks(‘AutomatedNotebooks-Manager.ipynb’ and ‘AutomatedNotebooks-IncidentTriage.ipynb’’) to the host. These should be stored in the same folder.

 

MSTICPy Config

In order to use threat intelligence providers as part of the incident triage notebook a msticpyconfig.yaml file containing details of those threat intelligence providers is required on the VM deployed. This should be placed in the same folder as the ‘AutomatedNotebooks-Manager.ipynb’ notebook and only needs to contain keys for TI providers and the incident triage notebook will use all primary providers configured. If you are using KeyVault to store these secrets you will also need to ensure that you assign the VM `Key Vault Secrets User` access to the KeyVault these are stored in as well.

More details on the msticpyconf.yaml file and how to set it up can be found in the MSTICPy documentation.

Optional Step: At this point we can test the configuration by manually triggering the notebooks. To do this ensure you have some incidents present in your Azure Sentinel workspace and then tell Papermill to manually trigger the scheduling notebook. To do this browse to the folder containing your notebooks and run the following command: papermill ‘AutomatedNotebooks-Manager.ipynb’ - This command will run the scheduling notebook and subsequently the incident triage notebooks. You will see the output of the notebook and execution in stdout and can triage this to ensure you don’t have any errors.

 

Once the papermill configuration is complete and the notebooks set up, you can schedule the ‘AutomatedNotebooks-Manager.ipynb’ to be run on a regular basis. This notebook will check for new incidents and run the triage notebook against them. Scheduling is done by simply using the OS’s build-in scheduling service, in this case cron. The precise schedule to run this notebook can be tuned depending on your requirements, however, for the most immediate response to new incidents being created we suggest that this be set to run every 10 minutes.

 

The scheduled command needs to a) navigate to the folder containing the notebooks and b) execute the ‘AutomatedNotebooks-Manager.ipynb’ notebook. An example cron entry to run the notebook every 10 minutes would be:

 

 

*/10 * * * * cd <path to notebooks folder> & papermill “AutomatedNotebooks-Manager.ipynb” SchedulerOut$ 

 

 

Once the schedule is set up, you will start to see incidents with comments that provide a link to a notebook in Azure ML. Only notebooks that include a significant finding are attached to incidents, otherwise the notebook is simply discarded.

16.png

 

Optional – Using Azure Storage Queues to manage which incidents are triaged

By default, the automated notebook process will run against all incidents raised in your Azure Sentinel Workspace. However, if you wish to only run the process against a subset of incidents, you can use a method that leverages Azure Storage Queues. Rather than pull all incidents from Azure Sentinel, the “AutomatedNotebooks-Manager.ipynb” notebook can be configured to pull selected incident IDs from the queue and run the automated notebooks against only those incidents. This following are entirely optional steps that are needed only if you want to use the queue method.

 

Create Storage Queue

For ease of management, we suggest that you create a Storage Queue in the Storage Account used by your Azure ML Workspace (see details earlier in the document to find this). If you choose to use another storage account, ensure that the VM’s Managed Identity is granted access to the Storage Account.

To create a queue, navigate to the Storage Account in the Azure Portal and select Queues. From here you can add a new Queue.

17.png

 

Once the Queue is created you will need to add the Queue name and the Storage Account name to the “AutomatedNotebooks-Manager.ipynb” notebook in the same cell that other variables were added to. In addition, you will need to update the Managed Identity assigned to the Azure VM to add a role of Storage Queue Data Reader scoped to the storage account where the Queue is deployed. More details on this role can be found here.

 

Configure Notebook to use Storage Queue

The “AutomatedNotebooks-Manager.ipynb” notebook contains the code to enable the collection of Incidents from the Queue but it is commented out by default. To use the queue method, uncomment that code and comment out or delete the cell above it that gets all incidents via the Azure Sentinel API.

18.png

 

Configure Incidents to Trigger Notebooks

Once the queue is created you can filter the incidents from Azure Sentinel that you wish to be added to the queue and thus processed by the automated notebooks. This is done with Azure Sentinel Automation Workbooks. For details in creating Workbooks please refer to this documentation.

The required workbook needs only two steps:

  1. When Azure Sentinel incident creation rule was triggered (Preview)
  2. Put a message on a queue:
    1. The Queue Name value should be set to the name of the Queue you created previously.
    2. The Message should be set to the Incident ARM ID dynamic property.

19.png

Once the playbook is created, configure which analytics rules you want auto triaged and configure these  to trigger this playbook when an incident is created. Details on how to configure this can be found here.  This will then write the incident ID to the queue so that it will be picked up by the “AutomatedNotebooks-Manager.ipynb” notebook.

20.png

 

Once this step is complete and the playbook is attached to one or more analytics, then the solution is configured. The next time a specified incident is triggered the automated notebook solution will trigger.

 

Using the notebooks during incident triage

Once the automated notebooks are configured you will start to see triaged incidents appearing in your Azure Sentinel instance. You can identify triaged incidents by the presence of a comment in the incident with a link to a notebook:

21.png

To access the completed triage notebook simply click the link in the comment and you will be directed to Azure ML.

Note: the analysts needing to access the triage notebooks will need access to the Azure ML workspace configured in this process.

Azure ML will open with the triage notebook automatically, the analyst can then browse the contents of the notebook without needing to interact with the notebook itself, they can simply scroll down to see the output:

22.png

Notebooks from other incidents are also accessible via the navigation pane on the left hand side of the Azure ML interface. Each notebook is stored with the name of the incident GUID they relate to.

 

The AutomatedNotebooks-IncidentTriage.ipynb and AutomatedNotebooks-Manager.ipynb notebooks can also be modified to include additional triage steps or update actions as required. By default the process enriches entities attached to the incident and only updates the incident severity, however it is possible to perform triage on other elements of the incident and update additional elements automatically. See MSTICpy for details of functions and features that could easily be added to these notebooks.

 

Troubleshooting

During execution of the “AutomatedNotebooks-Manager.ipynb” notebook a log of activity is written to the file “notebook_execution.log” in the same folder as the AutomatedNotebooks-Manager.ipynb. This provides details of execution flow and which incidents were processed, as such it should be the first thing you check when troubleshooting.

23.png

If not using the Queue method for incident triggers the mostly likely cause of issues is with the notebooks themselves. The easiest way to troubleshoot these is to run them individually and inspect the output. To trigger the “AutomatedNotebooks-Manager.ipynb” notebook you can manually invoke using Papermill via the CLI with the following command:

 

 

papermill “AutomatedNotebooks-Manager.ipynb” -

 

 

You will see the notebook output in the terminal to allow you to debug it, alternatively replace the `-` in the command with a file name to write out to the file specified.

If this notebook executes as expected, then you can also check the incident triage notebook itself. To do this select an incident in your Azure Sentinel workspace that has some entities attached and get the incident ID. This can be found in the incident view of the Sentinel portal as part of the Incident link. (The ID required is the GUID at the end of the full link text).

24.png

From there you can trigger that notebook from the CLI with:

 

 

papermill “AutomatedNotebooks-IncidentTriage.ipynb’” debug.ipynb -p incident_id “<INCIDENT ID>”

 

 

This will run the triage notebook with the Incident ID provided and will write the resulting notebook a file called `debug.ipynb`. This file is a complete copy of the original notebook but with all of the execution results (including any errors). You can open the debug file in Azure ML or other Jupyter notebook environment to check for any execution issues. You can view the file as raw text but the native JSON format makes it difficult to read. You can also convert the notebook to an HTML document using the nbcovert tool.

 

If you are using the Queue method, you will also want to ensure items are being properly passed to the Queue. This is easily done by browsing to the Queue resource in the Azure Portal. If functioning correctly and the attached incidents have occurred, then the queue should contain full incident links that should appear in the same format as: /subscriptions/796fca0e-7703-476a-9d66-a65d3a7825dd /resourceGroups/Sentinel/providers/Microsoft.OperationalInsights/workspaces/sentinelworkspace/providers/Microsoft.SecurityInsights/Incidents/f9e57a1f-8d1a-4efa-a165-4a48c2b2c46e.

 

Should you encounter issues with this solution please raise an Issue on the Azure-Sentinel-Notebooks GitHub repo.

 

Summary

In this blog we have seen how it is possible to use open source software and Azure services to easily automate the process of executing Jupyter Notebooks linked to Azure Sentinel. This approach can vastly improve the efficiency and effectiveness of SOC operations, as well as forming the core of a software defined monitoring approach. Whilst in this blog we have shown how this process can be used for triaging incidents and supporting first line SOC operations the same pattern could be applied to virtually any security monitoring scenario that involves the repetition of a set of analytical steps, whether it be enrichment of datasets, custom analytics using Python specific features, or threat intelligence processing. 

 

 

[i] This should be the same KeyVault you set up as a pre-requisite to this solution.

[i] VSCode is recommended for this

Posted at https://sl.advdat.com/3iTcCr9