Today, we are announcing the public preview of the ability to use custom Docker containers in Azure Machine Learning online endpoints. In combination with our new 2.0 CLI, this feature enables you to deploy a custom Docker container while getting Azure Machine Learning online endpoints’ built-in monitoring, scaling, and alerting capabilities.
Below, we walk you through how to use this feature to deploy TensorFlow Serving with Azure Machine Learning. The full code is available in our samples repository.
Sample deployment with TensorFlow Serving
To deploy a TensorFlow model with TensorFlow Serving, first create a YAML file:
name: tfserving-endpoint
type: online
auth_mode: aml_token
traffic:
tfserving: 100
deployments:
- name: tfserving
model:
name: tfserving-mounted
version: 1
local_path: ./half_plus_two
environment_variables:
MODEL_BASE_PATH: /var/azureml-app/azureml-models/tfserving-mounted/1
MODEL_NAME: half_plus_two
environment:
name: tfserving
version: 1
docker:
image: docker.io/tensorflow/serving:latest
inference_config:
liveness_route:
port: 8501
path: /v1/models/half_plus_two
readiness_route:
port: 8501
path: /v1/models/half_plus_two
scoring_route:
port: 8501
path: /v1/models/half_plus_two:predict
instance_type: Standard_F2s_v2
scale_settings:
scale_type: manual
instance_count: 1
min_instances: 1
max_instances: 2
Then create your endpoint:
az ml endpoint create -f endpoint.yml
And that’s it! You now have a scalable TensorFlow Serving endpoint running on Azure ML-managed compute.
Next steps
- Read our documentation
- See the sample with TorchServe
- Learn more about our Azure-built inference images.
- Look out for future samples showing ML.NET and R support