Wednesday, October 20, 2021

Mapping Data Flow gest new native connectors

The Microsoft Data Integration team has just released two new connectors for Mapping Data Flows. If you are using Synapse Analytics, you can now connect directly to your AWS S3 buckets for data transformations. In both ADF & Synapse, you can now natively connect to your Azure Data Explorer clusters in mapping data flows.



Mapping Data Flows provides scale-out data transformation in the cloud in Azure Data Factory and Azure Synapse Analytics. With these additional connectors, you can build ETL patterns at Spark scale in a code-free design environment without ever touching the Spark compute. Azure Integration Runtimes allow you to define the Spark environment and provide a serverless Spark compute for your data transformation pipelines.


In this example, I'm using the ADX connector as a source in my data flow to read NYC taxi data. I can perform any of the hundreds of data transformation patterns to reshape and transform the data. First, I am able to sample, view, and explore the source ADX data and associated data profiling stats.



Next, I'm going to aggregate the trip distance data grouped by the vendor IDs and rounding the average into a nicely formated new column called "avgDistance". After that, I'll map the vendor ID column to the full vendor name into a new column using the Derived Column transformation.



I want to create a new table on the fly in ADF that can store my aggregated analytics, so I tell ADF to create a new table called "aggtable" in the ADX database. The data flow Sink will define the new table with the vendor ID, average trip distance, and full vendor name. After executing this data flow from a pipeline, I can see the total number of rows written, columns, timing of each step, partitions created in Spark, and many other properties from the pipeline run.



Once the run is complete, you'll be able to see your new data transformed into a new table inside of Azure Data Explorer.




Posted at