Wednesday, November 24, 2021

Use ADF to Turn Hierarchical Structures into Relational Data

Azure Data Factory and Azure Synapse Analytics pipelines include the powerful code-free data transformation feature called Mapping Data Flows. In this post, I'll show you how you can use data flows to visually transform hierarchies into relational data for downstream consumption by destinations that require data to be shaped like a table.

 

The data I'm going to work with is a JSON source and I want to grab the coordinates and type properties from the geometry field. Note that the coordinates property inside the geometry structure is an array.

 

MarkKromer_0-1637792192038.png

 

The way to form this into a relational structure is by building a mapping data flow that first reshapes that data into your desired column format and then takes the array and unrolls that into a series of denormalized rows.

 

MarkKromer_1-1637792493412.png

 

The FlattenStructure transformation in the diagram above is a Derived Column. To turn your structure into a relational table, just pick the name of the struct in the Derived Column and use a column pattern to define which hierarchy you wish to flatten.

 

MarkKromer_3-1637793008306.png

 

What this will produce is an updated projection in your data flow with the 2 properties inside of the geometry structure now as separate columns.

 

MarkKromer_4-1637793153993.png

 

That leaves the coordinates array which requires the Flatten transformation. Arrays are trickier when moving from hierarchical to relational because each value in that array will need to be added to existing rows outside of that array, resulting in a denormalized schema.

 

In the Flatten transformation, you will pick the coordinates array for unroll and then define the target projection.

 

MarkKromer_5-1637793315766.png

 

That's it! You now have a fully flattened set of data from your JSON structures that can be consumed by relational targets.

MarkKromer_6-1637793716580.png

 

Click on these documentation links to learn more about Mapping Data Flows in ADF and Synapse: Flatten transformation, column patterns in Derived Column.

 

Posted at https://sl.advdat.com/3DQxuZv