Wednesday, September 29, 2021

Azure Cognitive Search--- CustomEntityLookupSkill and How to output this custom skill entity to Know

This blog will explain how to use CustomEntityLookupSkill skillset in Azure search and how to save the result to knowledge store.

Pre-requirements:

Before reading this blog, please confirm you have basic knowledge about Azure Search. These articles are the references :

Introduction to Azure Cognitive Search This document explained what Azure Cognitive Search is.

Indexer overview  explain about indexer and Skill Sets.

Knowledge store concepts - Azure Cognitive Search is about the definition of Knowledge store.

Azure Cognitive Search is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.

And Search Index is a Json content that search for full text documents in storage.

Indexers drive the AI enrichment capabilities of Cognitive Search, integrating external processing of content in route to an index.  

The AI enrichment could output in Azure table and blob storage, this storage called knowledge store.

A Skillset is a reusable resource in Azure Cognitive Search that is attached to an indexer. There are many built-in and custom skill sets. Among them CustomEntityLookupSkill is one of them.

 

Test:

After explained about the background, let’s talk about the tests for Custom Entity Lookup skill.

From this document Custom Entity Lookup cognitive search skill we know that this custom skill is to defined custom entities and words that could search in documents.

You could find this custom Entity lookup Skills when you edit your skillsets (Azure portal -> Cognitive Search->Skillsets) like the image show in below.

Scarlett_liu_0-1632895172318.png

From portal it will provide a template that don’t have any entities.

{

  "@odata.type": "#Microsoft.Skills.Text.CustomEntityLookupSkill",

  "defaultLanguageCode": "",

  "entitiesDefinitionUri": "",

  "inlineEntitiesDefinition": [

    {

      "name": "",

      "description": "",

      "type": "",

      "subtype": "",

      "id": "",

      "caseSensitive": true,

      "accentSensitive": true,

      "fuzzyEditDistance": 0,

      "defaultCaseSensitive": true,

      "defaultAccentSensitive": true,

      "defaultFuzzyEditDistance": 0,

      "aliases": " "

    }

  ],

  "globalDefaultCaseSensitive": true,

  "globalDefaultAccentSensitive": true,

  "globalDefaultFuzzyEditDistance": 0,

  "name": "",

  "description": "",

  "context": "",

  "inputs": [

    {

      "name": "text",

      "source": ""

    }

  ],

  "outputs": [

    {

      "name": "entities",

      "targetName": "entities"

    }

  ]

}

 

This is the example entity in the document below.

  • Here for name “Bill Gates” include aliases, so when search by “William H.Gates” it will also provide this result for “Bill Gates” in the documents.
  • And if search for ID “4e36bf9d-5550-4396-8647-8e43d7564a76” it will return the result for Xbox.

{

      "@odata.type": "#Microsoft.Skills.Text.CustomEntityLookupSkill",

      "name": "#12",

      "description": "",

      "context": "/document/merged_content",

      "defaultLanguageCode": "en",

      "entitiesDefinitionUri": "",

      "globalDefaultCaseSensitive": true,

      "globalDefaultAccentSensitive": true,

      "globalDefaultFuzzyEditDistance": 0,

      "inputs": [

        {

          "name": "text",

          "source": "/document/merged_content"

        }

      ],

      "outputs": [

        {

          "name": "entities",

          "targetName": "customEntities"

        }

      ],

    "inlineEntitiesDefinition":

    [

      {

        "name" : "Bill Gates",

        "description" : "Microsoft founder." ,

        "aliases" : [

            { "text" : "William H. Gates", "caseSensitive" : false },

            { "text" : "BillG", "caseSensitive" : true }

        ]

      },

      {

        "name" : "Xbox One",

        "type": "Hardware",

        "subtype" : "Gaming Device",

        "id" : "4e36bf9d-5550-4396-8647-8e43d7564a76",

        "description" : "The Xbox One product"

      }

]       

}

So how can we export the output entity to Knowledge store?   

When import a data source there is an option to add Knowledge Store. You can choose a Table storage or Blob storage in Azure Storage Account.

Scarlett_liu_1-1632895250784.png

 

 

But when we try to use a debug session (Azure portal-> Cognitive Search->Debug session) to save the output result in Knowledge store, it only saves entities ID, there isn’t any entity include the custom skill results.

Scarlett_liu_2-1632895279760.png

You may find here the outputs are “entities” so they are array value not the text could not directly save in Knowledge store.

So, we need add knowledge store in Skillset Definition Json.

For example:

 "knowledgeStore":{

   "storageConnectionString":"<YOUR-AZURE-STORAGE-ACCOUNT-CONNECTION-STRING>",

   "projections":[

      {

         "tables":[ ],

         "objects":[ ],

         "files":[ ]

      }

   }

 

And follow our skillsets wrote in above if you want to use table storage.

Then in the "tables":[ ] “need to add this value in below. Please be careful the source is from the Skillset input source and output name. input source is "source": "/document/merged_content”, output name is “customEntities”. So here sourceContext and input source is "/document/merged_content/customEntities

       {

            "tableName": "azureblobSkillset202108031612Custom",

            "referenceKeyName": null,

            "generatedKeyName": "Imagesid",

            "source": null,

            "sourceContext": "/document/merged_content/customEntities",

            "inputs": [

              {

                "name": "customEntities",

                "source": "/document/merged_content/customEntities",

                "sourceContext": null,

                "inputs": []

              }

            ]

          }

Here is the result of Knowledge store table storage.

Scarlett_liu_3-1632895308860.png

 

Meanwhile it also could save to Blob storage, but need to use Object in skillset Knowledge:

Here the source needs to use a multiple block blob, so I changed the source in this way.

  "objects": [

{

            "storageContainer": "testmysearchcustom20210803",

            "referenceKeyName": null,

            "generatedKeyName": "testmysearchcustom20210803Key",

            "sourceContext": null,

            "source": "/document/merged_content/customEntities/*",

            "inputs": []

          }

It will save results in a folder with key name like this one is partition key.

Scarlett_liu_4-1632895340978.png

 

And add to this folder in a json file

Scarlett_liu_5-1632895340985.png

 

Below is one of the Json result:

“{"name":"Bill Gates","description":"Microsoft Founder","id":"","type":"person","subtype":"","matches":[{"text":"william h. gates","offset":379232,"length":16,"matchDistance":0.0},{"text":"william h. gates","offset":379264,"length":16,"matchDistance":0.0},{"text":"william h. gates","offset":391596,"length":16,"matchDistance":0.0}],"Imagesid":"aHR0cHM6Ly9jczIxMDAzMjAwMGI5MjdlNDUxLmJsb2IuY29yZS53aW5kb3dzLm5ldC90ZXN0ZGF0YXNvdXJjZS9NU0ZUX0ZZMTdfMTBLLmRvY3g1_3B08702C71C2_merged_content_customEntities"}”

 

This is all the results from my tests. So, we know that to save CustomEntityLookupSkill outputs to knowledge store, you need add additional knowledge definition in Skillset definition. Meanwhile please also confirm the sourceContext and source for the tables or objects.

Posted at https://sl.advdat.com/3kNOFDV