Wednesday, November 3, 2021

Announcing General Availability of Speaker Recognition

How Does it Work?

Once you have provided audio training data for a single speaker, the speaker recognition service will create an enrollment profile based on the unique characteristics of the speaker's voice (also known as a voice signature). You can then cross-check audio voice samples against this profile to verify that the speaker is the same person (speaker verification) or cross-check audio voice samples against a group of enrolled speaker profiles to see if it matches any profile in the group (speaker identification).


  1. Speaker Verification

This service can be used to verify speakers for secure, fluid customer engagements in a wide range of use cases, such as call centers or interactive voice response systems.




Speaker verification can be either text-dependent or text-independent.

With text-dependent verification a speaker will say a passphrase to enroll their voice. The speaker will then be required to repeat that same passphrase used in enrollment when attempting to be verified by the speaker verification service.

Text-independent speaker verification also requires speakers to say an activation passphrase as part of the enrollment process, but after enrollment, speakers can use everyday language when attempting to be verified by the speaker verification service.

For both text-dependent and text-independent verification, the speaker's voice is enrolled by saying a passphrase from a set of predefined phrases. If the passphrase matches, the voice signature is created based on speakers' unique biometric voice characteristics. To help prevent misuse of this service, Microsoft requires that customers actively involve users in enrollment through this activation step. The activation step indicates the speakers' active participation in creating their voice signatures and is intended to help avoid the scenario in which speakers are enrolled without their awareness.


  1. Speaker Identification

Speaker identification is used to determine an unknown speaker’s identity within a group of enrolled speakers. Speaker identification enables you to attribute speech to individual speakers and unlock value from scenarios with multiple speakers.

Speaker identification is text-independent, which means that there are no restrictions on what the speaker says in the audio besides the required initial activation passphrase to activate the enrollment.


Data Security and Privacy


Speaker enrollment data is stored in a secured system, including the speech audio for enrollment and the voice signature features. The speech audio for enrollment is only used when the algorithm is upgraded, and the features need to be extracted again. The service does not retain the speech recording or the extracted voice features that are sent to the service during the recognition phase.

You control how long data should be retained. You can create, update, and delete enrollment data for individual speakers through API calls. When the subscription is deleted, all the speaker enrollment data associated with the subscription will also be deleted.

As with all the Azure Cognitive Services resources, developers should ensure that you have received the appropriate permissions from the users for Speaker Recognition.


Limited Access to Speaker Recognition


Speaker Recognition requires registration and Microsoft may limit access based on certain eligibility criteria. Customers who wish to use this service are required to submit an intake form. Access to Speaker Recognition is subject to Microsoft’s sole discretion based on eligibility criteria and a vetting process. Microsoft may require customers to reverify this information periodically.

Start from here to understand more about responsible use of Speaker Recognition.


Empowering Microsoft Partner [24.7].ai with Speaker Recognition


Microsoft partner [24]™ makes every customer-brand interaction more satisfying and cost efficient, driving customer loyalty, sales growth, and agent productivity for the world’s leading brands. The company combines deep vertical expertise, human insight, and years of contact center experience to ensure consistent, easy, personalized conversations across channels and time. [24] is transforming the digital customer experience (CX) through its cloud-based customer engagement platform, agent services, and managed services.

“We partnered with Microsoft in voice biometrics, not just for the company’s technology, which, based on our testing, is top notch, but also because Microsoft is known for safeguarding the security and privacy of its customers—a key consideration with speaker recognition software.”

—John Gaffney, VP, Voice Commerce Product Management, [24]

[24]™ incorporates Speaker Recognition technology into its [24]7 Voices™ product, an interactive voice response (IVR) platform that supports natural, intent-based customer interactions, boosts self-serve automation, and blends seamlessly with voice agents and digital channels. [24]7 Voices is itself part of the company’s [24] Engagement Cloud™ platform, a recognized industry leader in conversational AI.




By using Speaker Recognition, [24] provides the following benefits to its [24]7 Voices clients and their customers:

  •             A better customer experience: Voice biometrics enables more secure and streamlined customer journeys. Because it gives organizations confidence that the speaker is who they say they are, [24] clients avoid having to transfer customers to an agent for additional security screening. That means less hassle and wasted time for callers—and the longer they stay in the IVR system, the greater the opportunity for them to self-serve.
  •             Stronger authentication and increased security: Voice biometrics drastically reduces the risk of theft or hacks that are prevalent with passwords and PINs. Again, that’s a win for [24] clients and their customers.
  •             Significant cost savings: Voice biometric reduces operational and fraud costs and, by increasing IVR containment, enables [24] clients to provide more self-serve customer options within the IVR. It decreases handling time by agents, minimizes transfers, and reduces the burden of maintaining/resetting passwords—which cuts down IT time and manpower costs working on backend systems and integrations,

Although [24] clients access the speaker identification and verification components separately, the company considers them as living under a single voice biometrics umbrella.




It’s a Win-Win-Win

In a nutshell: Every time [24] clients reduce agent talk time, it saves them money AND improves their customers’ experience.

The trick is to do it really well, and safely. As noted in the quote above, [24] has great confidence in the Speaker Recognition technology. But technology alone isn’t enough. Deploying voice biometrics in today’s regulatory environment is challenging, to say the least, and in the end [24] was won over by Microsoft’s long-standing and well-known commitment to privacy and security.

Learn more about The building blocks of Microsoft’s responsible AI program and Azure AI – Cognitive Services, Speaker Recognition




Posted at