Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@locus.apache.org Received: (qmail 85073 invoked from network); 22 Sep 2008 23:04:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Sep 2008 23:04:14 -0000 Received: (qmail 4883 invoked by uid 500); 22 Sep 2008 23:04:11 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 4865 invoked by uid 500); 22 Sep 2008 23:04:11 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 4854 invoked by uid 99); 22 Sep 2008 23:04:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Sep 2008 16:04:11 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Sep 2008 23:03:10 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 5E106234C1E3 for ; Mon, 22 Sep 2008 16:03:44 -0700 (PDT) Message-ID: <1762176895.1222124624384.JavaMail.jira@brutus> Date: Mon, 22 Sep 2008 16:03:44 -0700 (PDT) From: "Dan Zinngrabe (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Updated: (HBASE-897) Backup/Export/Import Tool In-Reply-To: <28149454.1222124504293.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Zinngrabe updated HBASE-897: -------------------------------- Attachment: hbase_backup_release.tar.gz Unzip and run 'ant build' to create the binary. Documentation is included in the readme. Note that while this has primarily been used with 0.1.2 and 0.1.3, it should be usable on newer versions with little or no modifications. > Backup/Export/Import Tool > ------------------------- > > Key: HBASE-897 > URL: https://issues.apache.org/jira/browse/HBASE-897 > Project: Hadoop HBase > Issue Type: New Feature > Affects Versions: 0.1.2, 0.1.3 > Environment: MacOS 10.5.4, CentOS 5.1 > Reporter: Dan Zinngrabe > Priority: Minor > Attachments: hbase_backup_release.tar.gz > > > Attached is a simple import, export, and backup utility. Mahalo.com has been using this in production for several months to back up our HBase clusters as well as to migrate data from production to development clusters, etc. > Documentation included below is from the readme. > HBase Backup > author: Dan Zinngrabe dan@mahalo.com > ------------------ > Summary: > Simple MapReduce job for exporting data from an HBase table. The exported data is in a simple, flat format that can then be imported using another MapReduce job. This gives you both a backup capability, and a simple way to import and export data from tables. > Backup File Format > ------------------ > The output of a backup job is a flat text file, or series of flat text files. Each row is represented by a single line, with each item tab delimited. Column names are plain text, while column values are base 64 encoded. This helps us deal with tabs and line breaks in the data. Generally you should not have to worry about this at all. > Setup and installation > ------------------ > First, make sure your Hadoop installation is properly configured to load the HBase classes. This can easily be done by editing the hadoop-env.sh file to include HBase's jar libraries. You can add the following to hadoop-env.sh to have it load HBase classes: > export HBASE_HOME=/Users/quellish/Desktop/hadoop/hbase-0.1.2 > export HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.1.2.jar:$HBASE_HOME/conf:$HBASE_HOME/hbase-0.1.2-test.jar > Second, make sure the hbase-backup.jar file is on the classpath for Hadoop as well. While you can put this into a system-wide class path directory such as ${JAVA_HOME}/lib , it's much easier to just put it into > ${HADOOP_HOME}/lib > With that done, you are ready to go. Start up hadoop and HBase normally and you will be able to run a backup and restore. > Backing up > ------------------ > Backups are run using the Exporter class. From ${HADOOP_HOME} : > bin/hadoop com.mahalo.hadoop.hbase.Exporter -output backup -table text -columns text_flags: text_data: > This will output the backup into the new directory "backup" in the Hadoop File System, and will back up the columns "old_flags" and "old_text", with whatever the table's row identifier is. Colons are required in the column names, and this will produce multiple files in the output directory (simply 'cat' them together to form a single file). Note that if the backup directory exists it will stop. This may be changed in a future version. The output directory can also be any file system path or URL that Hadoop can understand, such as an S3 URL. > Restoring from a backup > ------------------ > From ${HADOOP_HOME} : > bin/hadoop com.mahalo.hadoop.hbase.Importer backup/backup.tsv text > This will load a single file (that you 'cat'd together from parts), backup/backup.tsv into the table text. Note that the table must already exist, and it can have data in it - those values can be overwritten by the restore process. You can create the table easily using HBase's Shell. The backup file can be loaded from any URL that Hadoop understands, such as a file URL or S3 URL. A path not formatted as URL (such as shown above) assumes a path from your user directory in the hadoop filesystem. > Combining a file from pieces using cat > ------------------ > As mentioned above, typically a MapReduce job will produce several files of output that must be assembled together to make a single file. On a unix system, this is fairly easy to do, using cat and the find command: First, export your data from the hadoop filesystem to the local filesystem: > bin/hadoop dfs -copyToLocal backup ~/mybackups > Then: > cd ~/ > find mybackups/. -name "part-00*" | xargs cat >> backup.tsv > This will take all the files in the "backup" directory matching the pattern "part-00*" and combine them into a file "backup.tsv" > Troubleshooting > ------------------ > During a restore/import, regionservers splitting or becoming unavailable is normal, and the application will recover from it. You may see errors in the logs, but this is normal. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.