Santa Claus has been kidnapped!
The Christmas elves have called upon you to save Santa Claus by developing an intelligent AI-app. You will build an object detection system that detects Santa Claus in images taken from live cameras mounted all over the Christmas village.
Foteini Savvidou
Hi, I am Foteini Savvidou, a Beta Microsoft Learn Student Ambassador!
I am an undergraduate Electrical and Computer Engineering student at Aristotle University of Thessaloniki (Greece) interested in AI, cloud technologies and biomedical engineering. Always passionate about teaching and learning new things, I love helping people expand their technical skills through organizing workshops and sharing articles on my blog. My goal is to use technology to promote accessibility, digital and social inclusion.
Introduction
Azure Custom Vision is an Azure Cognitive Services service that lets you build and deploy your own image classification and object detection models. Image classification models apply labels to an image, while object detection models return the bounding box coordinates in the image where the applied labels can be found.
Do you want to learn more about Azure Custom Vision? You can read my previous articles about creating a Custom Vision model for flower classification and an object detection model for grocery checkout.
In this article, we will build and deploy a festive object detection model to help the Christmas elves find Santa Claus. You will learn how to:
- Provision a Custom Vision resource.
- Build and train a custom object detection model in Azure Custom Vision.
- Use the Smart Labeler to easily tag images.
- Deploy and consume the model.
- Use Python and OpenCV to analyze images from a camera.
To complete the exercise, you will need an Azure subscription. If you don’t have one, you can sign up for an Azure free account. If you are a student, you can apply for an Azure for Students subscription.
Collect the data
To build and train our machine learning model, I created an image dataset consisting of 50 images of Santa Claus. You can download the dataset from my GitHub repository.
Create a Custom Vision Resource
To use the Custom Vision service, you can either create a Custom Vision resource or a Cognitive Services resource. If you plan to use Custom Vision along with other cognitive services, you can create a Cognitive Services resource.
In this exercise, you will create a Custom Vision resource.
- Sign in to Azure Portal and select Create a resource.
- Search for Custom Vision and in the Custom Vision card click Create.
- Create a Custom Vision resource with the following settings:
- Create options: Select Both.
- Subscription: Your Azure subscription.
- Resource group: Select an existing resource group or create a new one.
- Name: This would be your custom domain name in your endpoint. Enter a unique name.
- Training Resource:
- Location: Choose any available region, for example East US.
- Pricing tier: You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier.
 
- Prediction Resource:
- Location: Choose any available region, for example East US.
- Pricing tier: You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier.
 
 
- Select Review + Create and wait for deployment to complete.
- Once the deployment is complete, select Go to resource. Two Custom Vision resources are provisioned, one for training and one for prediction.
Create a new Custom Vision project
You can build and train your model by using the web portal or the Custom Vision SDKs and your preferred programming language. In this article, I will show you how to build an object detection model using the Custom Vision web portal.
- Navigate to the Custom Vision portal and sign in.
- Create a new project with the following settings:
- Name: SantaClausDetector
- Description: A festive object detection project
- Resource: The Custom Vision resource you created in the previous step.
- Project Types: Object detection
- Domains: General. Learn more about Custom Vision project domains at Microsoft Docs.
 
- Select Create project.
Upload and tag images
- In your Custom Vision project, select Add images.
- Select and upload all the images in the Train folder you extracted previously.
- Open the first image and manually tag the objects that you want the model to learn to recognize.
 
- Repeat the previous step for the remaining images.
- Then, explore the images that you have uploaded. There should be 42 images of Santa Claus.
- Select Add images and upload all the images in the SmartLabeler folder. Do not tag these images. You will train the model and then use the Smart Labeler to easily generate labels for the untagged images.
Train the model
- In the top menu bar, click the Train button to train the model using the tagged images.
- Then, in the Choose Training Type window, select Quick Training and wait for the training iteration to complete.
 
Evaluate the model
- When the training finishes, information about the model’s performance is estimated and displayed.
 
- The Custom Vision service calculates three metrics:
- Precision indicates the percentage of the class predictions that were correct.
- Recall indicates the percentage of class predictions that were correctly identified.
- Average precision (AP) measures model performance by computing the precision and recall at different thresholds.
 
Test the model
Let’s test the model and see how it performs on new data. We will use the images in the Test folder you extracted previously.
- In the top menu bar, select Quick Test.
- In the Quick Test window, click the Browse local files button and select a local image. The prediction is shown in the window.
 
Use the Smart Labeler
The Smart Labeler enables you to quickly tag a large number of images. The service uses the latest iteration of the trained model to predict the label of the untagged images. You can then confirm or decline the suggested tag.
- Navigate to the Training Images tab and under Tags select Untagged.
- Then, click the Get suggested objects button on the left pane.
- In the Set Smart Labeler Preference window, select the number of images for which you want suggestions. You can generate labels for a portion of images, then train the model and repeat this process. This way, you will improve the model and get better suggestions for the remaining untagged images.
- In this article, we will use the Smart Labeler to label all the untagged images. In the Set Smart Labeler Preference window, select All untagged images and then click Get started.
 
- Once the process is complete, you can confirm the suggestions or change the suggested labels and bounding box coordinates manually.
You can learn more about the Smart Labeler at the Custom Vision Service Documentation.
Train and evaluate the new model
- In the top menu bar, click the Train button and wait for the second training iteration to complete.
- Once the training is complete, review the performance metrics of the new model.
 
You can add more images in your model to improve the performance metrics. Learn more about how to improve your object detection model at the Custom Vision Service Documentation.
Test the model
Before publishing our model, let’s test it and see how it performs on new data.
 
Deploy the model
Once your model is performing at a satisfactory level, you can deploy it.
Publish the model
- In the Performance tab, select the latest iteration and then click Publish.
- In the Publish Model window, under Prediction resource, select the name of your Custom Vision prediction resource and then click Publish.
 
- Once your model has been successfully published, you'll see a Published label appear next to your iteration name in the left sidebar.
Get the ID of your project
In the Custom Vision portal, click the settings icon (⚙) at the top toolbar to view the project settings. Then, under General, copy the Project ID.
Get the key and endpoint of the prediction resource
Navigate to the Custom Vision portal homepage and select the settings icon (⚙) at the top right. Expand your prediction resource and save the Key and the Endpoint.
Test the prediction endpoint in a Python app
To create an object detection app with Custom Vision for Python, you'll need to install the Custom Vision client library. Install the Azure Cognitive Services Custom Vision SDK for Python package with pip:
pip install azure-cognitiveservices-vision-customvision
Then, create a new Python script (test.py) and open it in Visual Studio Code or in your preferred editor.
Want to view the whole Python script at once? You can find it on GitHub.
- Import the following libraries.from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient from msrest.authentication import ApiKeyCredentials from PIL import Image, ImageDraw, ImageFont import numpy as np import os
- In the next cell add this code. Relace <YOUR_PROJECT_ID>,<YOUR_KEY>and<YOUR_ENDPOINT>with the ID of your project, the key and the endpoint of your prediction resource, respectively.# Create variables for your project publish_iteration_name = "Iteration4" project_id = "<YOUR_PROJECT_ID>" # Create variables for your prediction resource prediction_key = "<YOUR_KEY>" endpoint = "<YOUR_ENDPOINT>" prediction_credentials = ApiKeyCredentials(in_headers={"Prediction-key": prediction_key}) predictor = CustomVisionPredictionClient(endpoint, prediction_credentials)
- Then, use the following code to call the prediction API in Python.# Detect objects in the test image img_file = os.path.join('Images', 'Test', 'SantaClaus (1).jpg') with open(img_file, mode="rb") as test_img: results = predictor.detect_image(project_id, publish_iteration_name, test_img)
- In the next cell, add the following code, which displays the test image, the detected objects and their tags along with their probabilities.# Load a test image and get its dimensions img = Image.open(img_file) img_height, img_width, img_ch = np.array(img).shape # Display the image draw = ImageDraw.Draw(img) # Select line width and color for the bounding box lineWidth = int(img_width/100) color = (0,255,0) # Display the results for prediction in results.predictions: if prediction.probability > 0.5: left = prediction.bounding_box.left * img_width top = prediction.bounding_box.top * img_height height = prediction.bounding_box.height * img_height width = prediction.bounding_box.width * img_width # Create a rectangle draw.rectangle((left, top, left+width, top+height), outline=color, width=lineWidth) # Display probabilities font = ImageFont.truetype("arial.ttf", 18) draw.text((left, top-20), f"{prediction.probability * 100 :.2f}%", fill=color, font=font) img.save("result.jpg")
Analyze images from camera with OpenCV
First, install OpenCV using the following command:
pip install opencv-python
We will use OpenCV to get an image from the camera, then we will analyze the image using our Custom Vision model and display a bounding box around every detected object.
- Create a new Python script (test-camera.py) and import the following libraries.import cv2 from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient from msrest.authentication import ApiKeyCredentials
- Then, add this code to define your credentials.# Create variables for your project publish_iteration_name = "Iteration4" project_id = "<YOUR_PROJECT_ID>" # Create variables for your prediction resource prediction_key = "<YOUR_KEY>" endpoint = "<YOUR_ENDPOINT>" prediction_credentials = ApiKeyCredentials(in_headers={"Prediction-key": prediction_key}) predictor = CustomVisionPredictionClient(endpoint, prediction_credentials)
- Use the following code to take an image from your camera and save it in a file.camera = cv2.VideoCapture(0, cv2.CAP_DSHOW) camera.set(cv2.CAP_PROP_FRAME_WIDTH, 640) camera.set(cv2.CAP_PROP_FRAME_HEIGHT, 480) ret, image = camera.read() cv2.imwrite('capture.png', image)
- Then, call the prediction API.with open("capture.png", mode="rb") as captured_image: results = predictor.detect_image(project_id, publish_iteration_name, captured_image)
- Now, you can display the predicted probabilities and a bounding box around every detected object.# Select color for the bounding box color = (0,255,0) # Display the results for prediction in results.predictions: if prediction.probability > 0.5: left = prediction.bounding_box.left * 640 top = prediction.bounding_box.top * 480 height = prediction.bounding_box.height * 480 width = prediction.bounding_box.width * 640 result_image = cv2.rectangle(image, (int(left), int(top)), (int(left + width), int(top + height)), color, 3) cv2.putText(result_image, f"{prediction.probability * 100 :.2f}%", (int(left), int(top)-10), fontFace = cv2.FONT_HERSHEY_SIMPLEX, fontScale = 0.7, color = color, thickness = 2) cv2.imwrite('result.png', result_image)
- Then, release the camera you have used.camera.release()
Summary and next steps
In this article, you learned how to create an object detection model in Azure Custom Vision and use a Custom Vision model in a Python app. If you are interested in learning more about Azure Custom Vision, check out these Microsoft Learn modules:
- Detect objects in images with the Custom Vision service
- Create computer vision solutions with Azure Cognitive Services: Detect objects in images
Share your awesome Custom Vision projects and feel free to reach out to me on LinkedIn or Twitter.
Clean-up
If you have finished learning, you can delete the resource group from your Azure subscription:
- In the Azure portal, select Resource groups on the right menu and then select the resource group that you have created.
- Click Delete resource group.
