Advanced Data Solutions : NVIDIA Triton Inference Server in Azure Machine Learning with managed online endpoints

We announced public preview of managed online endpoints in Azure Machine Learning, today we are excited to add new feature to this capability. You can now deploy Triton format models in Azure Machine Learning with managed online endpoints.

Triton is multi-framework, open-source software that is optimized for inference. It supports popular machine learning frameworks like TensorFlow, ONNX Runtime, PyTorch, NVIDIA TensorRT, and more. It can be used for your CPU or GPU workloads. You can deploy models using both the CLI (command line) and Azure Machine Learning studio.

Deploy model using Azure Machine Learning CLI (v2)

1. Prerequisites

The Azure CLI and the ml extension to the Azure CLI. For more information, see Install, set up, and use the CLI (v2) (preview).

Clone azureml-examples GitHub repository.

git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples
cd cli

BASE_PATH=endpoints/online/triton/single-model

2. Create endpoint

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: my-endpoint
auth_mode: aml_token

az ml online-endpoint create -n $ENDPOINT_NAME -f $BASE_PATH/create-managed-endpoint.yaml

3. Create deployment

name: blue
endpoint_name: my-endpoint
model:
  name: sample-densenet-onnx-model
  version: 1
  local_path: ./models
  model_format: Triton
instance_count: 1
instance_type: Standard_NC6s_v3

az ml online-deployment create --name blue --endpoint $ENDPOINT_NAME -f $BASE_PATH/create-managed-deployment.yaml --all-traffic

4. Invoke your endpoint

python $BASE_PATH/triton_densenet_scoring.py --base_url=$scoring_uri --token=$auth_token

5. Delete your endpoint and model

az ml online-endpoint delete -n $ENDPOINT_NAME --yes

az ml model delete --name $MODEL_NAME --version $MODEL_VERSION

Deploy model using Azure Machine Learning Studio

1. Register your model in Triton format using the following YAML and CLI command.

Get sample model from our samples GitHub repository : azureml-examples/cli/endpoints/online/triton/single-model at main · Azure/azureml-examples (github.com)

name: densenet-onnx-model
version: 1
local_path: ./models
model_format: Triton
description: Registering my Triton format model.

az ml model create -f create-triton-model.yaml

2. Deploy from Endpoints or Models page in Azure Machine Learning Studio

When you deploy a Triton format model, we do not require scoring script and environment.

No environment and scoring script needed for Triton model deployment.

Summary

Azure Machine Learning and NVIDIA Triton Inference Server integration is designed to make your model deployment experience smoother.

Resources

Documentation: High-performance serving with Triton Inference Server

Samples: azureml-examples/cli/endpoints/online/triton/single-model at main · Azure/azureml-examples (github.com)

Posted at https://sl.advdat.com/3qbCFz3

Friday, November 5, 2021

NVIDIA Triton Inference Server in Azure Machine Learning with managed online endpoints