Tuesday, April 19, 2022

Microsoft Purview Information Protection: General availability of 51 new sensitive information types

Today we are introducing Microsoft Purview—a comprehensive set of solutions that help you govern, protect, and manage your entire data estate. This new brand family combines the capabilities of the former Azure Purview and the Microsoft 365 Compliance portfolio that customers already rely on, providing unified data governance and risk management for your organization. As part of this announcement, Microsoft Information Protection will now be called Microsoft Purview Information Protection. 


Enterprises today face the challenge of classifying large volumes of data, especially personal data, which is required by privacy regulations and laws worldwide. At Microsoft, our goal is to provide a built-in, intelligent, unified, and extensible solution to protect sensitive data across your digital estate – in Microsoft 365 cloud services, on-premises, third-party SaaS applications, and more. With Microsoft Purview Information Protection, we are building a unified set of capabilities for classification, labeling, and protection not only in Office apps but also in other popular productivity services where information resides (e.g., SharePoint Online, Exchange Online, and Teams), as well as endpoint devices.   


At Microsoft Ignite in November 2021, we announced multiple enhancements to automatically classify information. We are now announcing the general availability of our capability to detect named entities, which span person names, physical addresses, and medical terms and conditions. This dramatically improves the speed and comprehensiveness of detection and classification of sensitive personal data as well as industry-specific regulatory data. 


Named entity detection enables identifying entities, such as person names, which can’t be easily identified through a regular expression, as they don’t typically follow any set patterns. Named entities can help improve:  

  • Detection of sensitive information related to industry regulations like the U.S. Health Insurance Act (HIPAA) through the ability to detect medical terms, such as drug names and medical conditions. 
  • Detection of personal data to aid in meeting regulations such as GDPR. 
  • Association of a person’s name (personal data) with sensitive data that is linked to that person (U.S. Social Security number or credit card information) – Personally Identifiable Information (PII).  

This release adds an additional 51 new named entity Sensitive Information Types (SITs), available in Microsoft Priva as well as the following Microsoft Purview solutions: Data Loss Prevention (DLP), Information Protection (i.e., auto-labeling), Data Lifecycle Management, Insider Risk Management, Records Management, eDiscovery Premium, and Exact Data Match. This builds upon the 200+ current SITs that are currently available. Named entities are supported in Exchange emails, Teams chats, SharePoint online, One Drive for Business, Defender for Cloud Apps, endpoint devices, and Office clients.


Both bundled and unbundled entities will be displayed as Sensitive Information Types. Bundled entities are differentiated from unbundled entities by their Type, which indicates the type of classifier, such as Exact Data Match.  


Bundled entities – will be associated with Type: BundledEntity. These include: 

  • All Full Names 
  • All Physical Addresses 
  • All Medical Terms and Conditions 

Unbundled entities – will be associated with Type: Entity. Unbundled entities are more specific and are a more granular subset of the All Physical Addresses and All Medical Terms and Conditions bundled entities. For example, these include country-specific addresses or specific medical terms and conditions that you can individually select:

  • Country-level addresses, such as: Australia Physical Addresses. 
  • We will offer 38 country-specific addresses, covering the US, EU and other regions.
  • Medical terms and conditions.
  • We will support 10 specific terms and conditions: Blood Test Terms, Types of Medication, Diseases, Brand Medication Names, Generic Medication Names, Impairments Listed In The U.S. Disability Evaluation Under Social Security, Lab Test Terms, Lifestyles That Relate To Medical Conditions, Medical Specialties, Surgical Procedures. 

All named entity SITs will be listed along with all other SITs on the Sensitive info types page within the Data classification solution, as shown below. Like for any other SIT, the test option is supported, allowing you to validate whether a data file uploaded contains named entities; however, at this time, the copy and edit features are not supported. 




In addition, we are announcing the general availability of 10 enhanced unified authoring policy templates, which can be used in DLP, auto-labeling, and Data Lifecycle Management solutions. These policy templates are updated to the existing unified authoring policy templates, such as HIPAA and GDPR. These enhanced policy templates include named entities in their definition and can improve the ability to identify and protect data as required by regulations. The enhanced templates can be easily identified by their names and are included in each category (Financial, Medical and health, Privacy) along with the other existing policy templates, as shown below; in addition, for ease of use, the enhanced templates are also included in the separate category “Enhanced”. 

  • Australia Health Records Act (HRIP Act) Enhanced 
  • Australia Privacy Act Enhanced 
  • General Data Protection Regulation (GDPR) Enhanced 
  • Japan Personally Identifiable Information (PII) Data Enhanced 
  • Japan Protection of Personal Information Enhanced 
  • U.S. Gramm-Leach-Bliley Act (GLBA) Enhanced 
  • U.S. Health Insurance Act (HIPAA) Enhanced 
  • U.S. Patriot Act Enhanced 
  • U.S. Personally Identifiable Information (PII) Data Enhanced 
  • U.S. State Breach Notification Laws Enhanced 

MicrosoftTeams-image (9).png


Together, these updates ensure better detection of personal data and reduction in false positives when sensitive information types such as U.S. Social Security Number (SSN) are found in combination with a named entity, such as a person name. The ability to detect a person’s name, address and or medical terms and conditions aids in detecting and protecting personal data, which in most countries is highly regulated. 


Best practices for creating a new or editing an existing policy with a named entity: 

  1. Consider the data type and format of the data file being classified, as well as the regulatory requirements. For a “strongly defined” SIT such as the U.S. Social Security Number (SSN), it’s best to use a lower instance count in the policy. For example, if you are trying to detect a list of SSNs in structured data such as a spreadsheet, then it’s best to define a policy that is optimized for the confidence and frequency of occurrences. In this case, requiring a minimum instance count of 3 or 5 instances would be best as opposed to a larger instance count, because if a keyword required by the SSN definition were only present in the column header, then only the first few SSNs in the column would likely be found in the required character proximity of the keyword corroborative evidence. Requiring a larger instance count (e.g., 100 or even 500) would likely cause the policy not to match.  
  2. For a named entity SIT, such as All Full Names, it’s best to set a larger instance count such as 10 or 50. If both the person's names and the SSNs are detected together, it’s more likely that the SSNs are truly SSNs, and we reduce the risk that the policy doesn’t trigger because not enough SSNs are detected.  
  3. Auto-labeling simulations can be leveraged to further fine-tune accuracy by adjusting the instance counts and confidence levels defined in your custom policies or the enhanced template conditions across simulations, before enabling in production of a DLP or auto-labeling policy containing named entities. 


When a file located in SharePoint Online or One Drive for Business is scanned for sensitive information and a named entity is detected, both the unbundled and bundled SIT will display as matching in Content Explorer, as unbundled SITs are a subset of bundled SITs. In the example below, there are 15 physical addresses found, 14 are US and one is a French physical address. 




Now that named entities are generally available, we encourage all customers with the requisite licenses to review in Content explorer where named entities are detected in locations such as SharePoint Online and One Drive for Business as well as consider adding named entities to their policies, either by leveraging one of the new enhanced policy authoring templates, such as the General Data Protection Regulation (GDPR) Enhanced template, or by manually configuring an existing or new custom policy. 


Note that an E5 or A5 license is required for accessing named entities, which is initially being released to commercial cloud customers; government clouds (GCC, GCC-High, Department of Defense) will be supported at a later date. 

Learn more about Microsoft Purview Information Protection and named entities here - you can quickly get started with our existing enhanced templates and customize them as needed. We are constantly extending our product capabilities to help organizations more easily classify and protect sensitive data as required for regulatory compliance.  

We look forward to hearing your feedback!  


Get Started   

We are happy to share that there is now an easier way for you to try Microsoft Purview solutions directly in the Microsoft Purview compliance portal with a free trial. By enabling the trial in the compliance portal, you can quickly start using all capabilities of Microsoft Purview, including Insider Risk Management, Records Management, Audit, eDiscovery, Communication Compliance, Information Protection, Data Lifecycle Management, Data Loss Prevention, and Compliance Manager.   

Visit your Microsoft Purview compliance portal for more details or check out the Microsoft Purview solutions trial (an active M365 E3 subscription is required as a prerequisite).   


Posted at https://sl.advdat.com/3Euef9dhttps://sl.advdat.com/3Euef9d