Thursday, November 4, 2021

Migrate on Premise HBase Cluster to HDInsight

HDInsight HBase is offered as a managed cluster that is integrated into the Azure environment. The clusters are configured to store data directly in Azure Storage, which provides low latency and increased elasticity in performance and cost choices. This property enables customers to build interactive websites that work with large datasets. To build services that store sensor and telemetry data from millions of end points. And to analyze this data with Hadoop jobs. HBase and Hadoop are good starting points for big data project in Azure. The services can enable real-time applications to work with large datasets. In this article we will see the approach to Migrate on Premise HBase to HDInsight HBase.

 

somnathghosh_1-1636008968716.png

 

Please follow the below documentation for Best Practice and the Benefit  for Migrating to Azure HDInsight.
https://docs.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-on-premises-migration-motivation

 

Use Case:  In this example we will demonstrate the two approach for Migration.

Copy the hbase folder:  With this approach, you copy all HBase data, without being able to select a subset of tables or column families. Subsequent approaches provide greater control. HBase uses the default storage selected when creating the cluster. HBase stores its data and metadata files under the following path: /hbase. In this approach there is more downtime based on data size.

Snapshot: This enable you to take a point-in-time backup of data in your HBase datastore. Snapshots have minimal overhead and complete within seconds, because a snapshot operation is effectively a metadata operation capturing the names of all files in storage at that instant. At the time of a snapshot, no actual data is copied. Snapshots rely on the immutable nature of the data stored in HDFS, where updates, deletes, and inserts are all represented as new data. You can restore (clone) a snapshot on the same cluster, or export a snapshot to another cluster. This can be done table by table but the advantage is of less downtime.

Incremental Load:
is common for both the approach

Lets run this step by step.

Option 1: Using Copying hbase folder approach

Step 1: Create Table in Source Cluster and insert some data

somnathghosh_2-1636008968720.png

Step 2: Note down the timestamp before stopping the HBase service from Ambari and taking the backup.

somnathghosh_3-1636008968722.png

Step 3: Copy the HBase Folder from source to Destination cluster

somnathghosh_4-1636008968728.png

 

Step 4: Rename the HBase directory to hbase_mig in destination cluster

somnathghosh_5-1636008968730.png


Step 5: Copy the Source HBase Data folder to destination HBase Data Folder 

somnathghosh_6-1636008968732.png


Note:
Please find the link for copying the data as there are multiple option based upon the business requirement. Here we have used Hadoop copy command.

Data migration: On-premises Apache Hadoop to Azure HDInsight | Microsoft Docs

 

Step 6: Once the Data is copied Restart the HBase Services From Ambari in destination Cluster.

 

Step 7: Verify the destination Cluster by listing the table and scanning the table.

somnathghosh_7-1636008968740.png

There may be scenario which takes time to copy the data in that case your on premise cluster still receives some data. Please refer to Incremental load option as mentioned in below document in incremental load section.

Option 2: Using Snapshot Based Approach

Step 1: Create Table in Source Cluster and insert some data

somnathghosh_8-1636008968744.png

Step 2: Flush the table

somnathghosh_9-1636008968745.png

 

Step 3: Disable the Table to Stop Writing

somnathghosh_10-1636008968746.png

 

Step 4: Take a Snapshot

somnathghosh_11-1636008968748.png


Step 5: Note down the Timestamp

somnathghosh_12-1636008968749.png

Step 6: Enable Table

somnathghosh_13-1636008968750.png

Step 7:  Export the Snapshot to destination Storage of HBase cluster to /hbase folder

somnathghosh_14-1636008968751.png
Step 7.1:  Export the snapshot to the Same Storage or Databox in case you are using Data Box to copy the data.(Data migration: On-premises Apache Hadoop to Azure HDInsight | Microsoft Docs)

somnathghosh_15-1636008968751.png

Step 7.2 Copy the Snapshot to the /hbase directory of the destination cluster  from the Databox.              

somnathghosh_16-1636008968752.png

Step 8: Import the Snapshot in Destination Cluster and verify the snapshot is exported.

somnathghosh_17-1636008968753.png

Step 9: Clone the snapshot.

somnathghosh_18-1636008968754.png

Step 10: Enable the table

somnathghosh_19-1636008968758.png

 

Incremental Load in HBase

Step1: Load some incremental data in source cluster

somnathghosh_20-1636008968760.png

 

Step2: Note down the End Time and run the export command to export to local directory with start and end time.

somnathghosh_21-1636008968761.png

Step 3: Distcp command or Azure ADF to copy the data.

somnathghosh_22-1636008968763.png

 

Step 4: Import command in Destination to import the data.

somnathghosh_23-1636008968764.png

Step 5: Verify the data

somnathghosh_24-1636008968767.png

 

Posted at https://sl.advdat.com/3q9mYZc