Using Azure Cognitive Services Speech to Text and Logic apps

No Code — Workflow style

please use this as reference, as this might not be your exact use case. But you can see how AI can help drive insights and automate large audio dataset

Pre requisite

Azure Account
Azure Storage account
Azure Cognitive Services
Azure Logic apps
Get connection string for storage
Get the primary key to be used as subcription key for cognitive services
Audio file should be wav format
Audio file cannot be too big
Audio time 10 min

Logic apps

First create a trigger from Blob

Create a connection string using blob connection string

Now bring “Reads Blob Content from Azure Storage”

Container name: audioinput
Choose dynamics and select the blob name as above picture
Bring HTTP action
Here we need to call the speech to text and pass the parameter
Accept: application/json;text/xml
Content-type: audio/wav; codecs=audio/pcm; samplerate=16000
Expect: 100-continue
Ocp-Apim-Subscription-Key: xxxx-xxxxxx-xxxxxx-xxxx
Transfer-Encoding: chunked

For Body Choose the read blob content
This should pass the audio binary content to cognitive service api
Now lets parseJSON the api output
Now select the body from http output
Provide the schema as

{
    "properties": {
        "Duration": {
            "type": "integer"
        },
        "NBest": {
            "items": {
                "properties": {
                    "Confidence": {
                        "type": "number"
                    },
                    "Display": {
                        "type": "string"
                    },
                    "ITN": {
                        "type": "string"
                    },
                    "Lexical": {
                        "type": "string"
                    },
                    "MaskedITN": {
                        "type": "string"
                    }
                },
                "required": [
                    "Confidence",
                    "Lexical",
                    "ITN",
                    "MaskedITN",
                    "Display"
                ],
                "type": "object"
            },
            "type": "array"
        },
        "Offset": {
            "type": "integer"
        },
        "RecognitionStatus": {
            "type": "string"
        }
    },
    "type": "object"
}

Now add action for upload data to blob
Give a container output
Give a output name

Go to Overview and then click Run Trigger and click -> Run
Upload the wav file
Wait for it to process

Give some time for the Speech API to process
Now go to blob storage

{
  "RecognitionStatus": "Success",
  "Offset": 300000,
  "Duration": 524000000,
  "NBest": [
    {
      "Confidence": 0.972784698009491,
      "Lexical": "the speech SDK exposes many features from the speech service but not all of them the capabilities of the speech SDK are often associated with scenarios the speech SDK is ideal for both real time and non real time scenarios using local devices files azure blob storage and even input and output streams when a scenario is not achievable with the speech SDK look for a rest API alternative speech to text also known as speech recognition transcribes audio streams to text that your applications tools or devices can consume more display use speech to text with language understanding louis to deride user intents from transcribed speech and act on voice commands you speech translation to translate speech input to a different language with a single call for more information see speech to text basics",
      "ITN": "the speech SDK exposes many features from the speech service but not all of them the capabilities of the speech SDK are often associated with scenarios the speech SDK is ideal for both real time and non real time scenarios using local devices files azure blob storage and even input and output streams when a scenario is not achievable with the speech SDK look for a rest API alternative speech to text also known as speech recognition transcribes audio streams to text that your applications tools or devices can consume more display use speech to text with language understanding louis to deride user intents from transcribed speech and act on voice commands you speech translation to translate speech input to a different language with a single call for more information see speech to text basics",
      "MaskedITN": "the speech sdk exposes many features from the speech service but not all of them the capabilities of the speech sdk are often associated with scenarios the speech sdk is ideal for both real time and non real time scenarios using local devices files azure blob storage and even input and output streams when a scenario is not achievable with the speech sdk look for a rest api alternative speech to text also known as speech recognition transcribes audio streams to text that your applications tools or devices can consume more display use speech to text with language understanding louis to deride user intents from transcribed speech and act on voice commands you speech translation to translate speech input to a different language with a single call for more information see speech to text basics",
      "Display": "The Speech SDK exposes many features from the speech service, but not all of them. The capabilities of the speech SDK are often associated with scenarios. The Speech SDK is ideal for both real time and non real time scenarios using local devices files, Azure blob storage and even input and output streams. When a scenario is not achievable with the speech SDK, look for a rest API. Alternative speech to text, also known as speech recognition, transcribes audio streams to text that your applications, tools or devices can consume more display use speech to text with language, understanding Louis to deride user intents from transcribed speech and act on voice commands. You speech translation to translate speech input to a different language with a single call. For more information, see speech to text basics."
    }
  ]
}

Above is the sample output
Confidence score and Display is available
Now process Text analytics and pull Key phrases, PII, Sentiment and Entities
Create 3 variable one for id, text and language
Create id

Create language

Create Text

Next is compose

{
  "documents": [
    {
      "id": @{variables('id')},
      "language": @{variables('language')},
      "text": @{variables('text')}
    }
  ]
}

text analytics API

https://cogsvcname.cognitiveservices.azure.com/text/analytics/v3.1/keyPhrases

Provide Header - Ocp-Apim-Subscription-Key
Headers - Content-Type
Body - Content from compose output

Parse JSON output

Schema

{
    "properties": {
        "documents": {
            "items": {
                "properties": {
                    "id": {
                        "type": "string"
                    },
                    "keyPhrases": {
                        "items": {
                            "type": "string"
                        },
                        "type": "array"
                    },
                    "warnings": {
                        "type": "array"
                    }
                },
                "required": [
                    "id",
                    "keyPhrases",
                    "warnings"
                ],
                "type": "object"
            },
            "type": "array"
        },
        "errors": {
            "type": "array"
        },
        "modelVersion": {
            "type": "string"
        }
    },
    "type": "object"
}

Delete the blob
name of blob: textanalytics.json

Save the blob now
name of blob: textanalytics.json

Now call Text analytics for PII

https://cogsvcnmae.cognitiveservices.azure.com/text/analytics/v3.1/entities/recognition/pii

Provide Header — Ocp-Apim-Subscription-Key
Headers — Content-Type
Body — Content from compose output
Now bring parseJSON

{
    "type": "object",
    "properties": {
        "documents": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "redactedText": {
                        "type": "string"
                    },
                    "id": {
                        "type": "string"
                    },
                    "entities": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "text": {
                                    "type": "string"
                                },
                                "category": {
                                    "type": "string"
                                },
                                "offset": {
                                    "type": "integer"
                                },
                                "length": {
                                    "type": "integer"
                                },
                                "confidenceScore": {
                                    "type": "number"
                                }
                            },
                            "required": [
                                "text",
                                "category",
                                "offset",
                                "length",
                                "confidenceScore"
                            ]
                        }
                    },
                    "warnings": {
                        "type": "array"
                    }
                },
                "required": [
                    "redactedText",
                    "id",
                    "entities",
                    "warnings"
                ]
            }
        },
        "errors": {
            "type": "array"
        },
        "modelVersion": {
            "type": "string"
        }
    }
}

now bring delete
blob name: textpii.json

now save the file to blob
blob name: textpii.json

now get Sentiment API

https://cogsvcnmae.cognitiveservices.azure.com/text/analytics/v3.1/sentiment

Provide Header — Ocp-Apim-Subscription-Key
Headers — Content-Type
Body — Content from compose output
Bring parseJSON

{
    "type": "object",
    "properties": {
        "documents": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "id": {
                        "type": "string"
                    },
                    "sentiment": {
                        "type": "string"
                    },
                    "confidenceScores": {
                        "type": "object",
                        "properties": {
                            "positive": {
                                "type": "number"
                            },
                            "neutral": {
                                "type": "number"
                            },
                            "negative": {
                                "type": "number"
                            }
                        }
                    },
                    "sentences": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "sentiment": {
                                    "type": "string"
                                },
                                "confidenceScores": {
                                    "type": "object",
                                    "properties": {
                                        "positive": {
                                            "type": "number"
                                        },
                                        "neutral": {
                                            "type": "number"
                                        },
                                        "negative": {
                                            "type": "number"
                                        }
                                    }
                                },
                                "offset": {
                                    "type": "integer"
                                },
                                "length": {
                                    "type": "integer"
                                },
                                "text": {
                                    "type": "string"
                                }
                            },
                            "required": [
                                "sentiment",
                                "confidenceScores",
                                "offset",
                                "length",
                                "text"
                            ]
                        }
                    },
                    "warnings": {
                        "type": "array"
                    }
                },
                "required": [
                    "id",
                    "sentiment",
                    "confidenceScores",
                    "sentences",
                    "warnings"
                ]
            }
        },
        "errors": {
            "type": "array"
        },
        "modelVersion": {
            "type": "string"
        }
    }
}

now bring delete
blob name: textsentiment.json

now bring save blob
blob name: textsentiment.json

now get the entities

https://cogsvcnmae.cognitiveservices.azure.com/text/analytics/v3.1/entities/recognition/general

Provide Header — Ocp-Apim-Subscription-Key
Headers — Content-Type
Body — Content from compose output
Bring parseJSON

{
    "type": "object",
    "properties": {
        "documents": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "id": {
                        "type": "string"
                    },
                    "entities": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "text": {
                                    "type": "string"
                                },
                                "category": {
                                    "type": "string"
                                },
                                "subcategory": {
                                    "type": "string"
                                },
                                "offset": {
                                    "type": "integer"
                                },
                                "length": {
                                    "type": "integer"
                                },
                                "confidenceScore": {
                                    "type": "number"
                                }
                            },
                            "required": [
                                "text",
                                "category",
                                "offset",
                                "length",
                                "confidenceScore"
                            ]
                        }
                    },
                    "warnings": {
                        "type": "array"
                    }
                },
                "required": [
                    "id",
                    "entities",
                    "warnings"
                ]
            }
        },
        "errors": {
            "type": "array"
        },
        "modelVersion": {
            "type": "string"
        }
    }
}

now bring delete
blob name: textentities.json