Why migrate on-premises big data workloads to Azure?

There are many reasons why customers consider migrating their existing on-premises big data workloads to Azure.

Cost of ownership - Running a cluster of computers in an on-premises data center requires a considerable administration effort, as well as consuming significant capital expenditure on hardware. Day-to-day running costs involved in maintaining a cool and stable environment for high-powered computing resources can also be a significant factor.
Uncertainty in on-premises ecosystem - In the past few years, Hadoop on Cloud has significantly helped shape the on-premises Hadoop ecosystem. Mergers, bankruptcies and license expirations have caused a sense of tremendous uncertainty amongst customers on the viability of continuing with their on-premises offerings.
Performance and autoscaling - An on-premises system is a static one-size-fits-all solution. Scaling is a timeconsuming, manual operation that involves a complex array of tasks. It's not easy to add resources to a live on-premises cluster.
Better VM types - Your on-premises solution might be restricted by the level of hardware available to support the virtual machines necessary to host evolving workloads.
HA/DR - High availability and disaster recovery is a major headache for many on-premises systems, requiring that you have built-in redundancy, and well-rehearsed plans for restoring full functionality.
Compliance - In a large-scale commercial system, you may be legally liable for maintaining the appropriate records and audit trails, and ensuring security.
End of support - Your existing system might be running on end-of-life software that is no longer supported. To ensure stability, you will be required to transition to a newer release

What is Enabling Hadoop Migrations on Azure ( EHMA ) ?

EHMA - Enabling quicker, easier and efficient Hadoop migrations, hence making Azure as the preferred cloud while migrating Hadoop workloads.

Many customers with On-prem Hadoop are facing extensive technical blockers be it for designing their On-cloud architectures or migrating it. Assessment of On-prem Hadoop infrastructure with the help of pre-built scripts and questionnaire will set off to a better planned migration and clear roadblocks in the early phases.

Prescriptive architecture as a starting point with room to customise - End state architectures are individually curated for each Hadoop Stack component on Azure for IaaS and PaaS, respectively.
Documented Prescriptive Guides - Provide the field team a guide to drive Hadoop migrations, deployment of base architectures on Azure to speed up the migration process.
Deployment Templates - Deployment of architectures on Azure are supported with the help of Bicep templates. Templates that can launch a configured, ready to use Infrastructure on Azure for IaaS and PaaS with all dependent Azure services included.
Comprehensive guidance - The comprehensive focuses on specific guidance and considerations you can follow to help move your existing Hadoop Infrastructure to Azure
Decision flows - In order to choose the best landing target, the comprehensive decisions tree helps navigating to the best available option according to the requirements.

Hadoop components migration approach

EHMA focuses on specific guidance and considerations you can follow to help move your existing platform/infrastructure -- On-Premises and Other Cloud to Azure. EHMA covers the following Hadoop ecosystem:

Component	Description	Decision Flow/Flowchats
Apache HDFS	Distributed File System	Planning the data migration , Pre-checks prior to data migration
Apache HBase	Column-oriented table service	Choosing landing target for Apache HBase , Choosing storage for Apache HBase on Azure
Apache Hive	Datawarehouse infrastructure	Choosing landing target for Hive, Selecting target DB for hive metadata
Apache Spark	Data processing Framework	Choosing landing target for Apache Spark on Azure
Apache Ranger	Frame work to monitor and manage Data secuirty
Apache Sentry	Frame work to monitor and manage Data secuirty	Choosing landing Targets for Apache Sentry on Azure
Apache MapReduce	Distributed computation framework
Apache Zookeeper	Distributed coordination service
Apache YARN	Resource manager for Hadoop ecosystem
Apache Storm	Distributed real-time computing system	Choosing landing targets for Apache Storm on Azure
Apache Sqoop	Command line interface tool for transferring data between Apache Hadoop clusters and relational databases	Choosing landing targets for Apache Sqoop on Azure
Apache Kafka	Highly scalable fault tolerant distributed messaging system	Choosing landing targets for Apache Kafka on Azure
Apache Atlas	Open source framework for data governance and Metadata Management

End State Reference Architecture

One of the challenges while migrating workloads from on-premises Hadoop to Azure is having the right deployment done which is aligning with the desired end state architecture and the application.

The Bicep deployment template(Reference Architecture Deployment ) aims to reduce a significant effort which goes behind deploying the PaaS services on Azure as below and having a production ready architecture up and running.

The above diagram depicts the end state architecture for big data workloads on Azure PaaS listing all the components deployed as a part of bicep template deployment. With Bicep we also have an additional advantage of deploying only the modules we prefer for a customised architecture.

Posted at https://sl.advdat.com/3sQjGeehttps://sl.advdat.com/3sQjGee

Advanced Data Solutions

Tuesday, March 8, 2022

Enabling Hadoop Migration to Azure

Why migrate on-premises big data workloads to Azure?

What is Enabling Hadoop Migrations on Azure ( EHMA ) ?

Hadoop components migration approach

End State Reference Architecture