Thursday, October 28, 2021

Reference Architecture: Using Informatica Intelligent Data Management Cloud with Azure Synapse

This article has been co-written by Kiran Subbarao, Director, Partner Solutions at Informatica. You can reach Kiran on


Business Challenge

Traditional on-premises data warehouses make it difficult and expensive to scale in a world of exponential data growth. When organizations are going through modernization and evaluating moving to cloud, they are looking at unique cloud-based architectures. Such architectures provide elasticity, native support for diverse data, and compelling performance at a fraction of the cost of on-premises systems, while eliminating the complexity of conventional data warehouses. 


Companies implementing large-scale data warehouses today face a number of challenges:

  • Data is diverse and distributed across different departments – including applications, data warehouses and data lakes – making it difficult to know exactly what data you have and where it resides. As data sources proliferate, the data landscape becomes even more complex.
  • Migrating this diverse and distributed data into the latest analytics system becomes a challenge.
  • Data quality issues arise due to duplicate/unorganized/unsecured/inaccurate data that resides in the data systems. Inaccurate information in a report throws off your numbers and leads to incorrect conclusions, minimizing the credibility of analytics. 

Informatica Intelligent Data Management Cloud (IDMC) can help organizations that adopted Azure Synapse discover data, accelerate data migration, and build trust by ensuring data quality. This document outlines a reference architecture that companies can leverage to deploy a solution with IDMC and Azure Synapse. 




Discover Data Using an Intelligent Data Catalog

To address the challenge of discovering data across diverse and distributed systems, Informatica Enterprise Data Catalog provides data analysts and IT users with powerful semantic search and dynamic facets to filter search results and obtain detailed data lineage, profiling statistics, data quality scorecards, holistic relationship views and data similarity recommendations. The data catalog supports more than 100 scanners to discover data from databases, file systems, applications, BI tools, mainframes, and data warehouses to extract metadata, which covers a wide range of on-premises and cloud systems. These include SAP, MangoDB, Salesforce, Teradata, Cassandra, and Kafka, to name a few.


An intelligent, integrated enterprise data catalog enables data engineers and developers to easily search for data assets across enterprise systems, understand datasets, view data lineage, and create data pipelines to ingest data for analytics into Azure Synapse.





Informatica Intelligent Data Management Cloud is a SaaS offering from Informatica hosted on Azure that provides Data Cataloging, Data and Application Integration, Data Quality, Master Data Management and Data Governance.


Mass Ingest Data into Synapse

Digital transformation is fueled by data, which is the key to accelerate the business and gain competitive advantage. To get started, it is essential to quickly ingest large volumes of data from a variety of sources onto Azure Synapse and continue to acquire delta from the transactional systems to stay up to date on the latest information. This brings in the need for mass data ingestion with change data capture (CDC) capability to support ELT workloads.


Once the data is discovered, it can be ingested into a data lake or dedicated SQL pool using the Informatica Mass Ingestion Service, and incremental changes can be captured using the CDC option.


Initial Load

Source data can be read at a single point in time and loaded into Azure Synapse. Mass ingestion helps migrate data from an on-premises or cloud data system to a data lake or a dedicated SQL pool. Initial load helps to materialize a target, to which incremental changes can be sent later.


Incremental Load

Incremental load facilitates to capture delta and propagate continuously into the target. The job captures the changes that have occurred since the last time it ran or from a specific starting point. The mass ingestion service helps keep track of changes and promotes updates to reporting and analytics systems, so that you can make informed decisions for your business based on the latest data.





Enrich Data with Cloud Data Quality

Discovering data and having it available for analytics is a primary step, but users also expect data to be of high quality so that they can trust it for decision-making. Informatica Cloud Data Quality provides data enrichment capabilities that help cleanse and maintain high-quality data despite its size or format. It ensures that address information is validated and improved, business data is profiled and cleansed, data governance practices are implemented, and other data quality requirements are met. Informatica Cloud Data Quality provides a unified platform to deliver high-quality data for all business initiatives and applications.


Data engineers and developers can develop data quality pipelines using a code-free interface that enables them to define simple or complex data quality rules. The analysts can also build six dimensions of data quality that encompass completeness, accuracy, consistency, validity, uniqueness, and integrity using the enriched data.





You can also standardize the data on Azure Synapse with a shared data language and build common schematics for applications using the common data model connector on IDMC.  



Informatica’s Intelligent Data Management Cloud for Azure enables you to discover and enrich the breadth of your analytics data, ingest that data into Azure Synapse, and cleanse it to ensure that the data being consumed is of the highest quality—and that the insights delivered provide a trusted foundation for sound business decisions.


To learn more about Informatica Intelligent Data Management solutions for Azure, check out the following resources:

Posted at