Advanced Data Solutions : How to speed up Azure Digital Twins Queries with Caching Strategy

If you are developing applications with Azure Digital Twins (ADT) which will materialize large twin graphs, this article is for you. Read on to learn how we accelerated services that need to traverse these graphs and how you can do the same using the right caching strategy.

As part of the CSE global engineering organization at Microsoft, our team developed an ADT-based solution together with a customer.
An essential requirement was to have low-latency responses for materializing graphs of several thousands of nodes which are infrequently updated. We achieved this goal by improving the speed of a 3000 nodes graph traversal from ~10 seconds to under a second.

ADT offers a powerful SQL-like query language to retrieve data out of a twin graph.
Traversing large graph sections implies the execution of many ADT-queries. This blog post presents an in-memory caching solution that we utilized to enhance the performance of twin graph traversals.

Prerequisites

.NET Core 3.1 on your development machine.
Be familiar with C#, Azure Digital Twins and Azure Digital Twins Explorer.

Create your Azure Digital Twins graph

We want to represent the factories of a company named Contoso.

Create an Azure Digital Twins instance and make sure you have the Azure Digital Twins Data Owner role.
Open the Azure Digital Twins Explorer.
Download the contoso-tree.zip provided in the attachments and import the contoso-tree.json to ADT and save it. You can select the import graph icon in the explorer and select the file to import it. Then save the graph.

You should see the following tree in the explorer.

The Azure Digital Twins Explorer shows that our Contoso company has two factories. Each factory is composed of rooms, and each room can contain machines.

Get the children of a twin

A common use case is the need to retrieve the children of a twin. For

example, we want to be able to list the rooms of a factory.

You can display the children of Factory1 by running the following query:

SELECT C FROM DigitalTwins T JOIN C RELATED T.contains WHERE T.$dtId = 'Factory1'

You should see the 2 following twins.

The Azure Digital Twins Explorer helps you visualizing the twins. If we develop an application, we need to be able to retrieve the twins programmatically. Let’s try to retrieve the children of a node with C#.

You can start by creating a new Console Application project and include the packages Azure.DigitalTwins.Core, System.Linq.Async, Azure.Identity.

dotnet new console -lang "C#" -f netcoreapp3.1
dotnet add package Azure.DigitalTwins.Core
dotnet add package System.Linq.Async
dotnet add package Azure.Identity

Then we can create a simple AzureDigitalTwinsRepository class. It will use the DigitalTwinsClient to query ADT.

namespace AzureDigitalTwinsCachingExample
{
    using System.Collections.Generic;
    using Azure.DigitalTwins.Core;
    using System.Linq;
    using System.Threading.Tasks;

    public class AzureDigitalTwinsRepository
    {
        private readonly DigitalTwinsClient _client;

        public AzureDigitalTwinsRepository(DigitalTwinsClient client)
        {
            _client = client;
        }
    }
}

Add a method to get the children of a twin in the AzureDigitalTwinsRepository class.

public IAsyncEnumerable<BasicDigitalTwin> GetChildren(string id)
{
    return _client.QueryAsync<IDictionary<string, BasicDigitalTwin>>
            ($"SELECT C FROM DigitalTwins T JOIN C RELATED T.contains WHERE T.$dtId = '{id}'")
        .Select(_ => _["C"]);
}

We can use our AzureDigitalTwinsRepository to display the children of Factory1:

namespace AzureDigitalTwinsCachingExample
{
    using System;
    using System.Threading.Tasks;
    using Azure.DigitalTwins.Core;
    using Azure.Identity;

    class Program
    {
        public static async Task Main(string[] args)
        {
            var adtInstanceUrl = "https://<your-adt-hostname>";
            var credential = new DefaultAzureCredential();
            var client = new DigitalTwinsClient(new Uri(adtInstanceUrl), credential);
            // First call to avoid cold start in next steps.
            _ = client.GetDigitalTwin<BasicDigitalTwin>("ContosoCompany");
            var adtRepository = new AzureDigitalTwinsRepository(client);

            var children = adtRepository.GetChildren("Factory1");
            await foreach (var child in children)
            {
                Console.Write($"{child.Id} ");
            }
        }
    }
}

Get the subtree of a twin

Imagine that we need our application to retrieve the subtree of a twin. We want to get the twin, its descendants and the relationships between these twins. We cannot achieve that with a single ADT Query. We must make a tree traversal like a breadth-first search for example.

Add a method to get the subtree of a node in the AzureDigitalTwinsRepository class:

public async Task<IDictionary<string, (BasicDigitalTwin twin, HashSet<BasicDigitalTwin> children)>> GetSubtreeAsync(string sourceId)
{
    var queue = new Queue<BasicDigitalTwin>();
    var subtree = new Dictionary<string, (BasicDigitalTwin twin, HashSet<BasicDigitalTwin> children)>();
    var sourceTwin = await _client.GetDigitalTwinAsync<BasicDigitalTwin>(sourceId);
    subtree[sourceId] = (sourceTwin, new HashSet<BasicDigitalTwin>());
    queue.Enqueue(sourceTwin);

    while (queue.Any())
    {
        var twin = queue.Dequeue();
        var children = GetChildren(twin.Id);
        await foreach (var child in children)
        {
            subtree[twin.Id].children.Add(child);
            if (subtree.ContainsKey(child.Id)) continue;
            queue.Enqueue(child);
            subtree[child.Id] = (child, new HashSet<BasicDigitalTwin>());
        }
    }

    return subtree;
}

When traversing the tree, we make several consecutive queries to ADT which makes the entire operation longer. To make the operation faster, let's see how we can cache the tree in-memory.

Caching

A secondary datastore can serve as a data cache to accelerate application operations while avoiding the need to query ADT multiple times in complex operations.
We decided to implement a simple in-memory cache as the data we were interested in was small enough to load in-memory and is infrequently updated. This enabled us to avoid adding additional infrastructure complexity with a relatively simple caching approach.

The cache must contain a subset of the twin graph transformed into a data structure appropriate for the problem at hand. Depending on the use case, it might be necessary to store data as a subgraph of the twin graph. Still, there might be other situations where simpler data structures like lists or maps simplify the cache implementation. We used a simple in-memory adjacency list.

We want to store the Contoso tree in-memory as an adjacency-list.

Let's create a caching repository. The caching repository uses the AzureDigitalTwinsRepository that we implemented to reload the cache.

namespace AzureDigitalTwinsCachingExample
{
    using System.Collections.Generic;
    using System.Linq;
    using System.Threading.Tasks;
    using Azure.DigitalTwins.Core;

    public class CachingRepository
    {
        private readonly AzureDigitalTwinsRepository _adtRepository;
        private IDictionary<string, (BasicDigitalTwin twin, HashSet<BasicDigitalTwin> children)> _graph;

        public CachingRepository(AzureDigitalTwinsRepository adtRepository)
        {
            _adtRepository = adtRepository;
        }

        public async Task ReloadCache()
        {
            // Reload the tree from the root.
            _graph = await _adtRepository.GetSubtreeAsync("ContosoCompany");
        }
    }
}

We can add a GetSubtree method that will traverse the in-memory graph instead of making several requests to ADT. The only difference with the previous implementation is that we get the digital twin and its children from the in-memory graph.

public IDictionary<string, (BasicDigitalTwin twin, HashSet<BasicDigitalTwin> children)> GetSubtree(string sourceId)
{
    var queue = new Queue<BasicDigitalTwin>();
    var subtree = new Dictionary<string, (BasicDigitalTwin twin, HashSet<BasicDigitalTwin> children)>();
    var sourceTwin = _graph[sourceId].twin;
    subtree[sourceId] = (sourceTwin, new HashSet<BasicDigitalTwin>());
    queue.Enqueue(sourceTwin);

    while (queue.Any())
    {
        var twin = queue.Dequeue();
        var children = _graph[twin.Id].children;
        foreach (var child in children)
        {
            subtree[twin.Id].children.Add(child);
            if (subtree.ContainsKey(child.Id)) continue;
            queue.Enqueue(child);
            subtree[child.Id] = (child, new HashSet<BasicDigitalTwin>());
        }
    }

    return subtree;
}

We can measure the duration of the 2 GetSubtree implementations.

namespace AzureDigitalTwinsCachingExample
{
    using System;
    using System.Diagnostics;
    using System.Linq;
    using System.Threading.Tasks;
    using Azure.DigitalTwins.Core;
    using Azure.Identity;

    class Program
    {
        public static async Task Main(string[] args)
        {
            var adtInstanceUrl = "https://<your-adt-hostname>";

            // Authenticate and create a client
            var credential = new DefaultAzureCredential();
            var client = new DigitalTwinsClient(new Uri(adtInstanceUrl), credential);
            // First call to avoid cold start in next steps.
            _ = client.GetDigitalTwin<BasicDigitalTwin>("ContosoCompany");

            var adtRepository = new AzureDigitalTwinsRepository(client);
            var cachingRepository = new CachingRepository(adtRepository);
            // Reloading the cache takes some time.
            await cachingRepository.ReloadCache();

            var stopwatch = Stopwatch.StartNew();
            var subtree = await adtRepository.GetSubtreeAsync("Factory1");
            stopwatch.Stop();
            Console.WriteLine($"Got subtree with {subtree.Count()} nodes in {stopwatch.ElapsedMilliseconds} ms");

            stopwatch.Restart();
            var subtreeFromCache = cachingRepository.GetSubtree("Factory1");
            stopwatch.Stop();
            Console.WriteLine(
                $"Got subtree with {subtreeFromCache.Count()} nodes in cache in {stopwatch.ElapsedMilliseconds} ms");
        }
    }
}

Cache loading and invalidation

You can preload the cache when the service starts and invalidate it when the graph is updated.
Event notifications can be a great trigger for that. Azure Digital Twins provides different type of events.
We wanted to avoid additional dependencies, so we created an extra twin in ADT and used it as an indicator to keep track of the last graph updates. The twin indicator is updated by the service whenever it modifies the graph in ADT. Then, our service periodically checks if the indicator twin got updated and refreshes the cache if it is the case.

Conclusion

Azure Digital Twins is a powerful tool to create a digital representation of an environment, and we have seen how caching can be used to enhance the performance of twin graph traversals.
An additional advantage is ADT cost optimization. ADT pricing includes a cost per query unit. Using a cache may help you reduce the number of query units used by your system. However, reloading the cache also consumes query units. The amount you can save depends on how expensive all the operations you avoid to compute are, but the cost to refresh the cache also impacts the number of query units used. That's why you need to make your own analysis to understand what is best for your system and how this type of strategy would help your case.

Now you can try this strategy and share your experience and workarounds in the comments below!

Contributors

Marc Gomez
Alexandre Gattiker
Christopher Lomonico
Izabela Kulakowska
Max Zeier
Peeyush Chandel

Posted at https://sl.advdat.com/36Ojm42

Tuesday, July 20, 2021

How to speed up Azure Digital Twins Queries with Caching Strategy