Tuesday, December 14, 2021

Azure Machine learning Service Designer - Data Engineering

How can we do Data engineering in Azure Machine Learning Service using designer

Prerequisite

  • Azure Account
  • Azure Storage
  • Azure Machine learning Service

Introduction

  • This tutorial is only to show how to do data engineering in Azure Machine Learning Service using designer.
  • Data used is Titanic dataset. which is a famous dataset in Machine Learning.
  • Open source dataset is used here.
  • Every task or flow item has parameters and output
  • After run every task output can be visualized
  • Output will change based on the task or flow item

Overall flow

 

designer1.jpg

 

  • Above is the overall experiment
  • Build using low code environment
  • All are drag and drop

What's done

Bring the dataset

Select columns in dataset

 

designer2.jpg

 

Execute python script - Correlation Chart

    import seaborn as sn
    import matplotlib.pyplot as plt

    corrMatrix = dataframe1.corr()
    print (corrMatrix)
    sn.heatmap(corrMatrix, annot=True)
    plt.show()
    img_file = "corrchart1.png"
    plt.savefig(img_file)

    from azureml.core import Run
    run = Run.get_context(allow_offline=True)
    run.upload_file(f"graphics/{img_file}", img_file)
 

designer3.jpg

 

  • Output

designer4.jpg

 

Execute python script - Covariance Chart

    covMatrix = dataframe1.cov()
    print (covMatrix)
    sn.heatmap(covMatrix, annot=True)
    plt.show()
    img_file = "covchart1.png"
    plt.savefig(img_file)

    from azureml.core import Run
    run = Run.get_context(allow_offline=True)
    run.upload_file(f"graphics/{img_file}", img_file)
  • Code

designer5.jpg

 

  • Output
 

designer6.jpg

 


 

Remove duplicate rows

designer7.jpg

 

Normalize data

designer8.jpg

 

Group data in bins

designer9.jpg

 

Edit Metadata to convert String to Categorical column - Name

designer10.jpg

 

Edit Metadata to convert String to Categorical column - Cabin

designer11.jpg

 

Edit Metadata to convert String to Categorical column - Embarked

designer12.jpg

 

Clip value - Avoid overfitting

designer13.jpg

 

Clean missing data

designer14.jpg

 

Apply math operations

designer15.jpg

 

Split data into training and test data

designer18.jpg

 

bring model to train

designer16.jpg

 

Train model

designer17.jpg

 

Score model

designer19.jpg

 

  • Output

designer20.jpg

 

Evaluate Model

  • output

designer21.jpg

 

  • Roc Curve

designer22.jpg

 

  • Confusion Matrix

designer23.jpg

 

This article is to show how to do data engineering in Azure machine learning designer only. Model is not accurate and open source data set is is used here.

 

original article - Samples2021/designerdataengg.md at main · balakreshnan/Samples2021 (github.com)

Posted at https://sl.advdat.com/324GRqc