Wednesday, November 17, 2021

Creating your first Microsoft Sentinel Notebook

Jupyter Notebooks are a fantastic resource for security analysts, providing a range of powerful and flexible capabilities. Microsoft Sentinel’s integration with Notebooks can provide a quick and straightforward way for security analysts to use Notebooks, however for those new to Notebooks and coding they can be a little daunting. 

In this blog we will cover some of the basics of creating your first Microsoft Sentinel Notebook using Python, including how to troubleshoot some common issues you may come across.  

  • Installing and importing packages in Python 
  • Installing and importing MSTICPy 
  • Setting up MSTICPy’s config file 
  • Getting data from Microsoft Sentinel 
  • Working with data 
  • Enriching results with external data sources 
  • Visualizations with MSTICPy 

Before we begin, make sure to familiarize yourself with Notebooks in Microsoft Sentinel via Azure Machine Learning.  

Use Jupyter Notebooks to hunt for security threats   

 

If you wish to learn more about this topic, we are running introductory training on December 16th, 2021: Become a Jupyter Notebooks Ninja – MSTICPy Fundamentals to Build Your Own Notebooks. Sign Up Here 

 

Installing and Importing Packages in Python 

One of the important things about using Python in Notebooks is that you can install and use code libraries (referred to as packages) created by others, allowing you to access the functionality they provide without having to code them yourself. 

There are several ways to install Python packages depending on how you want to find and access the packages, however the simplest and easiest is using pip. Pip (https://pypi.org/project/pip/) is the package installer for Python and makes finding and installing Python packages simple. 

You can use pip to install packages via the command line, or if you are using a Notebook, directly in a Notebook cell. Installing directly in a Notebook is often preferred as it ensures that you are installing the package in the same Python environment the Notebook is being executed in. To install via a Notebook code cell, we need to use `%pip` followed by install and the package name. e.g.: 

 

%pip install requests

 

Notebook output of running %pip install requestsNotebook output of running %pip install requests

Note: `%pip` is what is called a magic function in Jupyter. This tells the Notebook to use pip to install the package in the Notebooks compute environment.

If you already have a package installed but you want to update to the latest version, you can add the `--upgrade` parameter to the command used: 

 

%pip install –upgrade requests

 

You may also want to install a specific version of a package. This can be done by specifying the version number. 

 

%pip install requests==2.22.0

 

Output of running %pip install requests==2.22.0Output of running %pip install requests==2.22.0

Note:  Once you have installed a package it is recommended to restart the Notebook kernel, this will ensure that when you import the package you will be using the latest version. This is not necessary with newly installed packages but is important when

 

Note:  During installation of packages you may see some warnings related to package dependencies. This is because some packages have requirements on other packages being installed and sometimes these requirements can have conflicts (i.e., package 1 requires package A version 1.1 but package 2 also requires package A but version 1.2). We try to avoid conflicts as much as possible with our Notebooks but sometimes these can occur. You can usually run the Notebook without the conflicts affecting you. However, if you encounter a problem with a pre-made Microsoft Sentinel Notebook, please report this at via GitHub.

Once a package is installed, you need to import the package before it can be used. This is done with the `import` statement. 

There are 2 ways to import things in Python: 

- `import <package>` - this will do a standard import of the package 

- `from <package> import <item>` - this imports a specific item from the package 

 

You can also import packages and rename them for ease when calling them later: 

`import <package> as <alias>` 

 

import pandas as pd

 

Troubleshooting Tip: Some packages do not use the same name for installation and import. You many need to check package documentation to ensure you are importing correctly.

For example, the popular Machine Learning tool package scikit-learn is installed with: 

 

%pip install scikit-learn

 

However, it is imported with: 

 

import sklearn

 

Installing and Importing MSTICPy 

Now that we know how to install and import packages, we can install packages that will be useful to us in creating our Notebook. MSTICPy is a package created by the Microsoft Threat Intelligence Center (MSTIC) and provides a range of tools to make security analysis and investigations in Notebooks quicker and easier. You cand find out more about MSTICPy here: 

ReadTheDocs - MSTICPy 

We can now install MSTICPy. To make sure we get the latest version if we already have it installed, we are going to use the –upgrade parameter. 

 

%pip install --upgrade msticpy 

 

Now we could import MSTICPy with `import msticpy` however it is a big package with a lot of features, so to make it easier we have a function called `init_notebook` that conducts several checks to make sure the environment is good, handles key imports and set up for us. 

 

import msticpy
msticpy.init_notebook(globals())

 

Notebook output of running previous code cell.Notebook output of running previous code cell.

Setting up MSTICPy’s Config File 

MSTICPy can handle connections to a variety of data sources and services, including Microsoft Sentinel. As such it needs to handle several configuration details and credentials, things such as the Microsoft Sentinel workspaces you want to get data from, or API (Application Programming Interfaces) keys for external services such as Virus Total. 

To make it easier to manage and re-use the configuration and credentials for these things MSTICPy has its own config file that holds these items - `msticpyconfig.yaml`. 

The first time you use MSTICPy you need to populate your msticpyconfig.yaml file. This is a one-time activity once you have created it, you can simply re-use in future. To help with the set-up we have created several Notebook widgets to help you populate the file. 

PeteBryan_3-1637180304684.png

Note: If using Azure Machine Learning then you may notice this config widget can take some time to load. We are working to improve this but if you run the notebook in Jupyter, JupyterLab or VSCode you will not have these performance issues.

We have also created a Notebook to help you create to file. Once you have run the Getting Started Notebook it is recommended that you run the Configuring your Notebook Environment Notebook before creating your first Notebook, you can find this in the Microsoft Sentinel portal.

 

Microsoft Sentinel Notebook feature blade highlighting the Configuring you Notebook Environment NotebookMicrosoft Sentinel Notebook feature blade highlighting the Configuring you Notebook Environment Notebook

 You can also find more documentation on the config file and creation of it, in the MSTICPy docs  

 

Getting Data from Microsoft Sentinel  

Querying data from Microsoft Sentinel is handled by MSTICPy's `QueryProvider`. The first step is to initialize a QueryProvider and tell it we want to use the Microsoft Sentinel Query provider. 

Note: MSTICPy contains several QueryProviders for other data sources as well.

The other thing we want to provide the QueryProvider with is some details of the workspace we want to connect to. We *could* do this manually, but it is much easier to get details from the configuration we set up earlier. We can do this with `WorkspaceConfig` 

 

from msticpy.nbtools import nbinit
nbinit.init_Notebook(namespace=globals())
qry_prov=QueryProvider("MicrosoftSentinel")
ws_config = WorkspaceConfig(workspace="MyWorkspace") 

 

What WorkspaceConfig is doing is creating the connection string used by the QueryProvider. We can see what that connection string looks like with: 

 

ws_config.code_connect_str 

 

Notebook output showing the connection string generated by code_connect_strNotebook output showing the connection string generated by code_connect_str

Once set up we can tell the `QueryProvider` to `connect` which will kick off the authentication process. There are several ways that we can handle that authentication but when starting off we can use the default options that prompts the user to log in using a Device Code. 

 

qry_prov.connect(ws_config) 

 

This will then display a code in the Notebook cell output and prompt you to open a browser and end the code shown. You will then login as normal using your Azure AD (Azure Active Directory) credentials.  

Screenshots of the Device Code authentication flowScreenshots of the Device Code authentication flow

 You can then go back to the Notebook and see that the authentication has been completed: 

Notebook output showing the completed authentication flowNotebook output showing the completed authentication flow

Built-in Queries 

Now that we are connected to Microsoft Sentinel, we can start to look at running some queries to get some data. MSTICPy comes with several built-in Microsoft Sentinel queries to get some common datasets into the Notebook. These are different to the queries included in the Microsoft Sentinel GitHub and are more focused on collecting common sets of data that users might need to answer analytical questions. 

You can see a list of the MSTICPy queries with `.list_queries.` 

Notebook output of the list_queries commandNotebook output of the list_queries command

Note: MSTICPy also includes queries for its other Data Providers, and not just Microsoft Sentinel.

You can also use `.browse_queries()` to see the available queries in an interactive browser widget. 

Notebook output of browse_queriesNotebook output of browse_queries

Running a query 

Now that we have found a query that we want to run we simply pass its name to the `QueryProvider` and that in turn returns to results of the query in a Pandas Data Frame. Most queries support additional parameters, but we are showing one here that does not need any parameters. 

Note: the queries are attached to the QueryProvider as methods (functions) and grouped into categories based on the data source being queried. You can use tab completion or IntelliSense to help you navigate to the query you need.

 

qry_prov.Azure.list_all_signins_geo() 

 

Output of the list_all_signins_geo queryOutput of the list_all_signins_geo query

Troubleshooting tip: If a query does not execute at first make sure you have run `qry_prov.connect()` to authenticate to Microsoft Sentinel first. Notebook cells do not have to be run in order so you can go back and run any that you missed. However, many notebooks do have cells that rely on previous cells being executed first so be careful about jumping ahead if you have not created the notebook yourself.

 

Troubleshooting tip: If a query is not returning the results you expect, pass ‘print’ along as a parameter when calling the query to print out the KQL query being executed.

More typically the query function will expect parameters such as the host name or IP address that you are searching for.  

 

qry_prov.LinuxSyslog.user_logon(host_name="mylxhost") 

 

If you try to run a query without supplying the required parameter, it will return an error message including the help for the query with the parameter definitions. 

Most queries also require date/time parameters for the beginning/ending bounds of the query. By default, these are supplied by a time range set in the query provider. Each instance of a query provider has its own time range. You can change the default query range by running the following. 

 

qry_prov.query_time 

 

This brings up a widget letting you change the defaults for this query provider. You can also supply "start” and “end” parameters to the query function – either as Python datetimes or as time strings:  

 

from datetime import datetime
qry_prov.LinuxSyslog.user_logon(
    host_name="mylxhost",
    start="2021-11-19 20:30",
    end=datetime.utcnow()
)  

 

Customizing Your Queries 

In addition to the stock query, we can customize certain elements of the query. 

For example, if we want to append a line with `| take 10` to the query we have selected to limit the number of results returned we can pass that in with the `add_query_items` parameter: 

 

qry_prov.SecurityAlert.list_alerts(add_query_items="| take 10") 

 

The output of the list_alerts queryThe output of the list_alerts query

Tip: You can also use KQLMagic to query Sentinel data using KQL queries within notebooks. KQLMagic also returns data in a Pandas Data Frame.

 

Working With Data 

Data returned by the `QueryProvider` comes back in a Pandas Data Frame. This provides us with a powerful and flexible way to access our data. 

One of the core things we want to do is look at specific rows in our table. Each table has an index that can be used to call a row using `.loc`, alternatively we can return a row by its position in the table with `.iloc` 

 

alert_df.loc[1] 

 

Selecting a row with ilocSelecting a row with iloc

We can also choose just to return specific columns by providing a list of them to the Data Frame (note the "[:5]” means return the last 5 rows): 

 

 

alert_df.iloc[:5][["AlertName", "AlertSeverity", "Description"]] 

 

Filtering columns of a DataFrameFiltering columns of a DataFrame

We can also do things such as search for rows with specific data: 

 

alert_df[alert_df["AlertName"].str.contains("credential theft")] 

 

Searching for rows of a DataFrame matching a criteriaSearching for rows of a DataFrame matching a criteria

Tip: Pandas has loads and loads of features to help you find, analyze, transform, and visualize data. As Pandas data structures are key to Microsoft Sentinel Notebooks, we recommended you spend some time getting familiar with some of their features they offer - https://pandas.pydata.org/ 

 

Enriching data using external data sources 

One of the powerful elements of Notebooks is combining data from Microsoft Sentinel with data from other sources. One of the most common sources of this data in security is Threat Intelligence (TI) data. MSTICPy has support for several Threat Intelligence data sources including: 

  • VirtusTotal 
  • GreyNoise 
  • AlienVault OTX 
  • IBM XForce 
  • Microsoft Sentinel TI data 
  • OPR (for PageRank details) 
  • ToR ExitNode information. 

The first step in using these TI sources is to create a `TILookup` object. This can then be used to perform lookups against the various supported providers. 

Lookups can be done against individual items via `.lookup_ioc` or against multiple items with `.lookup_iocs` and you can configure things such as which Threat Intelligence sources are used. 

 

ti = TILookup()
ti.lookup_iocs(signin_df, obs_col="IPAddress", providers=["GreyNoise"]) 

 

Lookup_iocs resultsLookup_iocs results

To make viewing results easier there is a widget to allow you to interactively browse results: 

 

ti.browse_results(ti_df) 

 

TI results browser widgetTI results browser widget

Azure API Access 

MSTICPy also has integration with a range of Azure APIs that can be used to retrieve additional information or perform actions such as get Microsoft Sentinel incidents. 

 

from msticpy.data.azure_sentinel import AzureSentinel
azs = AzureSentinel()
azs.connect()
azs.get_incident(incident_id = "7c768f11-31f1-46ca-8a5c-25df2e6b7021", sub_id = "8df49d90-99eb-4c31-985d-64b3f33caa93", res_grp= "sent", ws_name="workspace") 

 

Output of Azure APIsOutput of Azure APIs

You can find out more about MSTICPy’s support for Azure APIs in the documentation: https://msticpy.readthedocs.io/en/latest/data_acquisition/AzureData.html & https://msticpy.readthedocs.io/en/latest/data_acquisition/AzureSentinel.html  

 

Visualizations with MSTICPy   

The ability to create complex, interactive visualizations is one of the key benefits of Notebooks, allowing analysts to see data in a unique way and use it to identify patterns of anomalies that may not otherwise be possible to identify.  

Creating these visualizations from scratch can be quite a complex task and involve a lot of code if starting from nothing. To make the process easier MSTICPy contains several common visualizations work out the box with common data sources from Microsoft Sentinel, and that can quickly and easily be called with minimal code. 

 

Timelines 

Understanding when events occurred and in what order is a key component of many security investigations. MSTICPy can plot diverse types of timelines with several types of data. 

 

user_df = qry_prov.Azure.list_aad_signins_for_account(account_name="pdemo@seccxpninja.onmicrosoft.com")
timeline.display_timeline(user_df, source_columns=["UserPrincipalName", "ResultType"] 

 

Timeline visualizationTimeline visualization

Troubleshooting Tip: If you are defining columns from a DataFrame as a parameter in another function (as we do above with source_columns) you can sometimes run into issues if you specify a column that does not exist. If you want to see what columns a DataFrame has you can call `DataFrame.columns` to get a list of all the columns.

We can also plot time lines showing events with a duration rather than a single time stamp with ` display_timeline_duration`: 

 

timeline_duration.display_timeline_duration(alert_df, group_by="AlertName", time_column="StartTimeUtc", end_time_column="EndTimeUtc") 

 

Timeline duration visualizationTimeline duration visualization

Tip: You can also call the timeline visualization directly from a DataFrame with ‘mp_plot’

 

alert_df.mp_plot.timeline(group_by="Severity", source_columns=["AlertName", "TimeGenerated"]) 

 

Grouped timeline visualizationGrouped timeline visualization

Matrix Plots 

The Matrix Plot graph in MSTICPy allows you to plot the interactions between two elements in your data. This can be useful for seeing the relationships between points in a dataset, for example if you wanted to see how often certain IP addresses are communicating with each other in a network you can create a matrix plot with a source IP address on one axis, and a destination IP address on the other axis. 

As with the timeline plots, the matrix plot can be created directly from a DataFrame using `mp_plot`: 

 

network_data.mp_plot.matrix(x="SourceIP", y="DestinationIP", title="IP Interaction") 

 

Matrix visualizationMatrix visualization

Widgets 

We have seen a couple of widgets already in the query and threat intelligence result browsers. These widgets make Notebooks much more accessible by providing a visual way to interact and customize them without having to write any code. MSTICPy includes a number visual, interactive widgets to allow users to select various parameters to customize the Notebook. 

 

network_vendor_data_q = "CommonSecurityLog | summarize by DeviceVendor"
network_vendor_data = qry_prov.exec_query(network_vendor_data_q)
network_selector = nbwidgets.SelectItem(
    item_list=network_vendor_data["DeviceVendor"].to_list(),
    description='Select a vendor',
    action=print,
    auto_display=True
); 

 

Using the SelectItem widget to select a network vendor from dataUsing the SelectItem widget to select a network vendor from data

 

q_times = nbwidgets.QueryTime(units='day', max_before=20, before=5, max_after=1)
q_times.display() 

 

Time range selection widgetTime range selection widget

 

security_alerts = qry_prov.SecurityAlert.list_alerts(add_query_items="| take 10")
alert_select = nbwidgets.SelectAlert(alerts=security_alerts, action=nbdisplay.display_alert)
display(Markdown('### Alert selector with action=DisplayAlert'))
display(HTML("<b> Alert selector with action=DisplayAlert </b>"))
alert_select.display() 

 

Alert selector widgetAlert selector widget

What to do Next 

What you have seen here is just a tiny taster of what Microsoft Sentinel Notebooks can do. However, luckily, we have a lot of additional resources to help you learn what you need and get started with Notebooks. 

We recommend that you do the following: 

Posted at https://sl.advdat.com/3nmKe49