How can we do Data engineering in Azure Machine Learning Service using designer
Prerequisite
- Azure Account
- Azure Storage
- Azure Machine learning Service
Introduction
- This tutorial is only to show how to do data engineering in Azure Machine Learning Service using designer.
- Data used is Titanic dataset. which is a famous dataset in Machine Learning.
- Open source dataset is used here.
- Every task or flow item has parameters and output
- After run every task output can be visualized
- Output will change based on the task or flow item
Overall flow
- Above is the overall experiment
- Build using low code environment
- All are drag and drop
What's done
Bring the dataset
Select columns in dataset
Execute python script - Correlation Chart
import seaborn as sn
import matplotlib.pyplot as plt
corrMatrix = dataframe1.corr()
print (corrMatrix)
sn.heatmap(corrMatrix, annot=True)
plt.show()
img_file = "corrchart1.png"
plt.savefig(img_file)
from azureml.core import Run
run = Run.get_context(allow_offline=True)
run.upload_file(f"graphics/{img_file}", img_file)
- Output
Execute python script - Covariance Chart
covMatrix = dataframe1.cov()
print (covMatrix)
sn.heatmap(covMatrix, annot=True)
plt.show()
img_file = "covchart1.png"
plt.savefig(img_file)
from azureml.core import Run
run = Run.get_context(allow_offline=True)
run.upload_file(f"graphics/{img_file}", img_file)
- Code
- Output
Remove duplicate rows
Normalize data
Group data in bins
Edit Metadata to convert String to Categorical column - Name
Edit Metadata to convert String to Categorical column - Cabin
Edit Metadata to convert String to Categorical column - Embarked
Clip value - Avoid overfitting
Clean missing data
Apply math operations
Split data into training and test data
bring model to train
Train model
Score model
- Output
Evaluate Model
- output
- Roc Curve
- Confusion Matrix
This article is to show how to do data engineering in Azure machine learning designer only. Model is not accurate and open source data set is is used here.
original article - Samples2021/designerdataengg.md at main · balakreshnan/Samples2021 (github.com)