Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2E3B5200497 for ; Wed, 23 Aug 2017 18:18:17 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2C9D81690A6; Wed, 23 Aug 2017 16:18:17 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4BA3E1690A2 for ; Wed, 23 Aug 2017 18:18:16 +0200 (CEST) Received: (qmail 89206 invoked by uid 500); 23 Aug 2017 16:18:14 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Delivered-To: moderator for dev@hbase.apache.org Received: (qmail 1069 invoked by uid 500); 23 Aug 2017 10:21:33 -0000 Delivered-To: apmail-hadoop-hbase-dev@hadoop.apache.org X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.735 X-Spam-Level: *** X-Spam-Status: No, score=3.735 tagged_above=-999 required=6.31 tests=[DKIM_ADSP_CUSTOM_MED=0.001, FREEMAIL_ENVFROM_END_DIGIT=0.25, NML_ADSP_CUSTOM_MED=1.2, SPF_HELO_PASS=-0.001, SPF_SOFTFAIL=0.972, URI_HEX=1.313] autolearn=disabled Date: Wed, 23 Aug 2017 03:21:29 -0700 (MST) From: RuthEvans To: hbase-dev@hadoop.apache.org Message-ID: <1503483689781-4089926.post@n3.nabble.com> Subject: Tips for Migrating to Apache HBase on Amazon S3 from HDFS MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable archived-at: Wed, 23 Aug 2017 16:18:17 -0000 Starting with Amazon EMR 5.2.0, you have the option to run Apache HBase on Amazon S3. Running HBase on S3 gives you several added benefits, including lower costs, data durability, and easier scalability. HBase provides several options that you can use to migrate and back up HBas= e tables. The steps to migrate to HBase on S3 are similar to the steps for HBase on the Apache Hadoop Distributed File System (HDFS). However, the migration can be easier if you are aware of some minor differences and a fe= w =E2=80=9Cgotchas.=E2=80=9D In this post, I describe how to use some of the common HBase migration options to get started with HBase on S3. HBase migration options Selecting the right migration method and tools is an important step in ensuring a successful HBase table migration. However, choosing the right ones is not always an easy task. The following HBase helps you migrate to HBase on S3: Snapshots Export and Import CopyTable The following diagram summarizes the steps for each option. Various factors determine the HBase migration method that you use. For example, EMR offers HBase version 1.2.3 as the earliest version that you ca= n run on S3. Therefore, the HBase version that you=E2=80=99re migrating from = can be an important factor in helping you decide. For more information about HBase versions and compatibility, see the HBase version number and compatibility documentation in the Apache HBase Reference Guide. If you=E2=80=99re migrating from an older version of HBase (for example, HB= ase 0.94), you should test your application to make sure it=E2=80=99s compatibl= e with newer HBase API versions. You don=E2=80=99t want to spend several hours mig= rating a large table only to find out that your application and API have issues with a different HBase version. The good news is that HBase provides utilities that you can use to migrate only part of a table. This lets you test your existing HBase applications without having to fully migrate entire HBase tables. For example, you can use the Export, Import, or CopyTable utilities to migrate a small part of your table to HBase on S3. After you confirm that your application works with newer HBase versions, you can proceed with migrating the entire table using HBase snapshots. Option 1: Migrate to HBase on S3 using snapshots You can create table backups easily by using HBase snapshots. HBase also provides the ExportSnapshot utility, which lets you export snapshots to a different location, like S3. In this section, I discuss how you can combine snapshots with ExportSnapshot to migrate tables to HBase on S3. For details about how you can use HBase snapshots to perform table backups, see Using HBase Snapshots in the Amazon EMR Release Guide and HBase Snapshots in the Apache HBase Reference Guide. These resources provide additional settings and configurations that you can use with snapshots and ExportSnapshot. The following example shows how to use snapshots to migrate HBase tables to HBase on S3. Note: Earlier HBase versions, like HBase 0.94, have a different snapshot structure than HBase 1.x, which is what you=E2=80=99re migrating to. If you= =E2=80=99re migrating from HBase 0.94 using snapshots, you get a TableInfoMissingException error when you try to restore the table. For details about migrating from HBase 0.94 using snapshots, see the Migrating from HBase 0.94 section. From the source HBase cluster, create a snapshot of your table: $ echo "snapshot '', ''" | hbase shell Export the snapshot to an S3 bucket: $ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot -copy-to s3:/// For the -copy-to parameter in the ExportSnapshot utility, specify the S3 location that you are using for the HBase root directory of your EMR cluster. If your cluster is already up and running, you can find its S3 hbase.rootdir value by viewing the cluster=E2=80=99s Configurations in the = EMR console, or by using the AWS CLI. Here=E2=80=99s the command to find that v= alue: $ aws emr describe-cluster --cluster-id | grep hbase.rootdir Launch an EMR cluster that uses the S3 storage option with HBase (skip this step if you already have one up and running). For detailed steps, see Creating a Cluster with HBase Using the Console in the Amazon EMR Release Guide. When launching the cluster, ensure that the HBase root directory is set to the same S3 location as your exported snapshots (that is, the location used in the -copy-to parameter in the previous step). Restore or clone the HBase table from that snapshot. To restore the table and keep the same table name as the source table, use restore_snapshot: $ echo "restore_snapshot ''"| hbase shell To restore the table into a different table name, use clone_snapshot: $ echo "clone_snapshot '', ''" | hbase shell Migrating from HBase 0.94 using snapshots If you=E2=80=99re migrating from HBase version 0.94 using the snapshot meth= od, you get an error if you try to restore from the snapshot. This is because the structure of a snapshot in HBase 0.94 is different from the snapshot structure in HBase 1.x. The following steps show how to fix an HBase 0.94 snapshot so that it can b= e restored to an HBase on S3 table. Complete steps 1=E2=80=943 in the previous example to create and export a s= napshot. From your destination cluster, follow these steps to repair the snapshot: Use s3-dist-cp to copy the snapshot data (archive) directory into a new directory. The archive directory contains your snapshot data. Depending on your table size, it might be large. Use s3-dist-cp to make this step faster= : $ s3-dist-cp --src s3:///.archive/ --dest s3:///archive/data/default/ Create and fix the snapshot descriptor file: $ hdfs dfs -mkdir s3:///.hbase-snapshot//.tabledesc $ hdfs dfs -mv s3:///.hbase-snapshot//.tableinfo.<*> s3:///.hbase-snapshot//.tabledesc Restore the snapshot: $ echo "restore_snapshot ''" | hbase shell Option 2: Migrate to HBase on S3 using Export and Import As I discussed in the earlier sections, HBase snapshots and ExportSnapshot are great options for migrating tables. But sometimes you want to migrate only part of a table, so you need a different tool. In this section, I describe how to use the HBase Export and Import utilities. The steps to migrate a table to HBase on S3 using Export and Import is not much different from the steps provided in the HBase documentation. In those docs, you can also find detailed information, including how you can use the= m to migrate part of a table. The following steps show how you can use Export and Import to migrate a table to HBase on S3. From your source cluster, export the HBase table: $ hbase org.apache.hadoop.hbase.mapreduce.Export s3://// In the destination cluster, create the target table into which to import data. Ensure that the column families in the target table are identical to the exported/source table=E2=80=99s column families. From the destination cluster, import the table using the Import utility: $ hbase org.apache.hadoop.hbase.mapreduce.Import '' s3://// HBase snapshots are usually the recommended method to migrate HBase tables. However, the Export and Import utilities can be useful for test use cases i= n which you migrate only a small part of your table and test your application= . It=E2=80=99s also handy if you=E2=80=99re migrating from an HBase cluster t= hat does not have the HBase snapshots feature. Option 3: Migrate to HBase on S3 using CopyTable Similar to the Export and Import utilities, CopyTable is an HBase utility that you can use to copy part of HBase tables. However, keep in mind that CopyTable doesn=E2=80=99t work if you=E2=80=99re copying or migrating table= s between HBase versions that are not wire compatible (for example, copying from HBase 0.94 to HBase 1.x). For more information and examples, see CopyTable in the HBase documentation= . Conclusion In this post, I demonstrated how you can use common HBase backup utilities to migrate your tables easily to HBase on S3. By using HBase snapshots, you can migrate entire tables to HBase on S3. To tes= t HBase on S3 by migrating or copying only part of your tables, you can use the HBase Export, Import, or CopyTable utilities. If you have questions or suggestions, please comment below. -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Tips= -for-Migrating-to-Apache-HBase-on-Amazon-S3-from-HDFS-tp4089926.html Sent from the HBase Developer mailing list archive at Nabble.com.