hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From RuthEvans <ruthevans...@gmail.com>
Subject Tips for Migrating to Apache HBase on Amazon S3 from HDFS
Date Wed, 23 Aug 2017 10:21:29 GMT
Starting with Amazon EMR 5.2.0, you have the option to run Apache HBase
<https://tekslate.com/>   on Amazon S3. Running HBase on S3 gives you
several added benefits, including lower costs, data durability, and easier
scalability.

HBase provides several options that you can use to migrate and back up HBase
tables. The steps to migrate to HBase on S3 are similar to the steps for
HBase on the Apache Hadoop Distributed File System (HDFS). However, the
migration can be easier if you are aware of some minor differences and a few
“gotchas.”

In this post, I describe how to use some of the common HBase migration
options to get started with HBase on S3.

HBase migration options
Selecting the right migration method and tools is an important step in
ensuring a successful HBase table migration. However, choosing the right
ones is not always an easy task.

The following HBase helps you migrate to HBase on S3:

Snapshots
Export and Import
CopyTable
The following diagram summarizes the steps for each option.




Various factors determine the HBase migration method that you use. For
example, EMR offers HBase version 1.2.3 as the earliest version that you can
run on S3. Therefore, the HBase version that you’re migrating from can be an
important factor in helping you decide. For more information about HBase
versions and compatibility, see the HBase version number and compatibility
documentation in the Apache HBase Reference Guide.

If you’re migrating from an older version of HBase (for example, HBase
0.94), you should test your application to make sure it’s compatible with
newer HBase API versions. You don’t want to spend several hours migrating a
large table only to find out that your application and API have issues with
a different HBase version.

The good news is that HBase provides utilities that you can use to migrate
only part of a table. This lets you test your existing HBase applications
without having to fully migrate entire HBase tables. For example, you can
use the Export, Import, or CopyTable utilities to migrate a small part of
your table to HBase on S3. After you confirm that your application works
with newer HBase versions, you can proceed with migrating the entire table
using  HBase <https://tekslate.com/>   snapshots.

Option 1: Migrate to HBase on S3 using snapshots
You can create table backups easily by using HBase snapshots. HBase also
provides the ExportSnapshot utility, which lets you export snapshots to a
different location, like S3. In this section, I discuss how you can combine
snapshots with ExportSnapshot to migrate tables to HBase on S3.

For details about how you can use HBase snapshots to perform table backups,
see Using HBase Snapshots in the Amazon EMR Release Guide and HBase
Snapshots in the Apache HBase Reference Guide. These resources provide
additional settings and configurations that you can use with snapshots and
ExportSnapshot.

The following example shows how to use snapshots to migrate HBase tables to
HBase on S3.

Note: Earlier HBase versions, like HBase 0.94, have a different snapshot
structure than HBase 1.x, which is what you’re migrating to. If you’re
migrating from HBase 0.94 using snapshots, you get a
TableInfoMissingException error when you try to restore the table. For
details about migrating from HBase 0.94 using snapshots, see the Migrating
from HBase 0.94 section.

>From the source HBase cluster, create a snapshot of your table:
$ echo "snapshot '<table_name>', '<snapshot_name>'" | hbase shell
Export the snapshot to an S3 bucket:
$ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
<snapshot_name> -copy-to s3://<HBase_on_S3_root_dir>/
For the -copy-to parameter in the ExportSnapshot utility, specify the S3
location that you are using for the HBase root directory of your EMR
cluster. If your cluster is already up and running, you can find its S3
hbase.rootdir value by viewing the cluster’s Configurations in the EMR
console, or by using the AWS CLI. Here’s the command to find that value:

$ aws emr describe-cluster --cluster-id <cluster_id> | grep hbase.rootdir
Launch an EMR cluster that uses the S3 storage option with HBase (skip this
step if you already have one up and running). For detailed steps, see
Creating a Cluster with HBase Using the Console in the Amazon EMR Release
Guide. When launching the cluster, ensure that the HBase root directory is
set to the same S3 location as your exported snapshots (that is, the
location used in the -copy-to parameter in the previous step).
Restore or clone the HBase table from that snapshot.
To restore the table and keep the same table name as the source table, use
restore_snapshot:
$ echo "restore_snapshot '<SNAPSHOT_NAME>'"| hbase shell
To restore the table into a different table name, use clone_snapshot:
$ echo "clone_snapshot '<snapshot_name>', '<table_name>'" | hbase shell
Migrating from HBase 0.94 using snapshots
If you’re migrating from HBase version 0.94 using the snapshot method, you
get an error if you try to restore from the snapshot. This is because the
structure of a snapshot in HBase 0.94 is different from the snapshot
structure in HBase 1.x.

The following steps show how to fix an HBase 0.94 snapshot so that it can be
restored to an HBase on S3 table.

Complete steps 1—3 in the previous example to create and export a snapshot.
>From your destination cluster, follow these steps to repair the snapshot:
Use s3-dist-cp to copy the snapshot data (archive) directory into a new
directory. The archive directory contains your snapshot data. Depending on
your table size, it might be large. Use s3-dist-cp to make this step faster:
$ s3-dist-cp --src s3://<HBase_on_S3_root_dir>/.archive/<table_name> --dest
s3://<HBase_on_S3_root_dir>/archive/data/default/<table_name>
Create and fix the snapshot descriptor file:
$ hdfs dfs -mkdir
s3://<HBase_on_S3_root_dir>/.hbase-snapshot/<snapshot_name>/.tabledesc

$ hdfs dfs -mv
s3://<HBase_on_S3_root_dir>/.hbase-snapshot/<snapshot_name>/.tableinfo.<*>
s3://<HBase_on_S3_root_dir>/.hbase-snapshot/<snapshot_name>/.tabledesc
Restore the snapshot:
$ echo "restore_snapshot '<snapshot_name>'" | hbase shell
Option 2: Migrate to HBase on S3 using Export and Import
As I discussed in the earlier sections, HBase snapshots and ExportSnapshot
are great options for migrating tables. But sometimes you want to migrate
only part of a table, so you need a different tool. In this section, I
describe how to use the HBase Export and Import utilities.

The steps to migrate a table to HBase on S3 using Export and Import is not
much different from the steps provided in the HBase documentation. In those
docs, you can also find detailed information, including how you can use them
to migrate part of a table.

The following steps show how you can use Export and Import to migrate a
table to HBase on S3.

>From your source cluster, export the HBase table:
$ hbase org.apache.hadoop.hbase.mapreduce.Export <table_name>
s3://<table_s3_backup>/<location>/
In the destination cluster, create the target table into which to import
data. Ensure that the column families in the target table are identical to
the exported/source table’s column families.
>From the destination cluster, import the table using the Import utility:
$ hbase org.apache.hadoop.hbase.mapreduce.Import '<table_name>'
s3://<table_s3_backup>/<location>/
HBase snapshots are usually the recommended method to migrate HBase tables.
However, the Export and Import utilities can be useful for test use cases in
which you migrate only a small part of your table and test your application.
It’s also handy if you’re migrating from an HBase cluster that does not have
the HBase snapshots feature.

Option 3: Migrate to HBase on S3 using CopyTable
Similar to the Export and Import utilities, CopyTable is an HBase utility
that you can use to copy part of HBase tables. However, keep in mind that
CopyTable doesn’t work if you’re copying or migrating tables between HBase
versions that are not wire compatible (for example, copying from HBase 0.94
to HBase 1.x).

For more information and examples, see CopyTable in the HBase documentation.

Conclusion
In this post, I demonstrated how you can use common HBase backup utilities
to migrate your tables easily to HBase on S3. By using HBase snapshots, you
can migrate entire tables to  HBase <https://tekslate.com/>   on S3. To test
HBase on S3 by migrating or copying only part of your tables, you can use
the HBase Export, Import, or CopyTable utilities.

If you have questions or suggestions, please comment below.



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Tips-for-Migrating-to-Apache-HBase-on-Amazon-S3-from-HDFS-tp4089926.html
Sent from the HBase Developer mailing list archive at Nabble.com.

Mime
View raw message