hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/HowToMigrate" by stack
Date Thu, 23 Jul 2009 20:45:07 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by stack:

  It will disable compression setting all column families to no compression.  Enable manually
post-compression (Rare should be person who has compression enabled -- talk to us if this
is a problem).
- ==== Preparing for Migration ====
- You MUST do a few things first before you can begin migration of either hadoop or hbase.
- ===== Major Compacting all Tables =====
- Before you begin, you MUST run a major compaction on all tables including .META. table.
 A major compaction compacts all store files in a family together dropping deleted and expired
cells.  Major compaction is necessary because the way deletes work changed in 0.20 hbase.
 Migration will not work without your completing major compaction.  Use the shell to start
up major compactions.  For example, the below cluster has only one table named 'a'.  See how
we run a major_compaction on each:
- {{{stack@connelly:~/checkouts/hbase/branches/0.19$ ./bin/hbase shell
- HBase Shell; enter 'help<RETURN>' for list of supported commands.
- Version: 0.19.4, r781868, Tue Jul 14 11:27:58 PDT 2009
- hbase(main):001:0> list
- a                                                                                      
- 2 row(s) in 0.1251 seconds
- hbase(main):002:0> major_compact 'a'
- 0 row(s) in 0.0400 seconds
- hbase(main):003:0> major_compact '.META.'
- 0 row(s) in 0.0245 seconds
- hbase(main):004:0> major_compact '-ROOT-'
- 0 row(s) in 0.0173 seconds}}}
- In the above, the compaction took no time.  The case will likely be different for you if
you have big tables.
- The way to confirm that the major compaction completed is to do a listing of the hbase rootdir
in hdfs.  For each region on the filesystem, each of its stores should have one mapfile only
if major compaction succeeded.  For example, below we list whats under the 'a' table directory
under the hbase rootdir:
- {{{/tmp/hbase-stack/hbase/a
- /tmp/hbase-stack/hbase/a/1833721875
- /tmp/hbase-stack/hbase/a/1833721875/a
- /tmp/hbase-stack/hbase/a/1833721875/a/info
- /tmp/hbase-stack/hbase/a/1833721875/a/info/8167759949199600085
- /tmp/hbase-stack/hbase/a/1833721875/a/info/.8167759949199600085.crc
- /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles
- /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles/8167759949199600085
- /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles/8167759949199600085/data
- /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles/8167759949199600085/.data.crc
- /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles/8167759949199600085/.index.crc
- /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles/8167759949199600085/index}}}
- There is one column family in this table named 'a' (unfortunately, since it muddles the
example, the table name is also 'a').  The table has one region whose encoded name is 1833721875.
 Under this region directory, there are family directories -- in this case there is one for
the 'a' family -- and under each family directory, there is the {{{info}}} -- for store file
metadata -- and the {{{mapfiles}}} directories.  There is only one mapfile in our case above,
named 8167759949199600085 (MapFiles are made of data and index files).
- You cannot migrate unless all has been major compacted first.  
- -ROOT- and .META. flush frequently so they can mess up your nice and tidy single-file per
store major_compacted hbase layout.  They won't flush if there have not been edits so, make
sure your cluster is not taking writes and hasn't been doing so for a good while before starting
up the major compaction process.  Getting your cluster to shutdown with one file only in -ROOT-
and .META. may be a bit tough so to help, facility has been added to the HEAD of the 0.19
branch that will allow you major compact catalog regions in a shutdown hbase.  This facility
only works on the -ROOT- and .META. catalog tables, not on user space tables.  For usage,
- {{{./bin/hbase org.apache.hadoop.hbase.regionserver.HRegion}}}
- For example, to major compact the -ROOT-:
- {{{$ ./bin/hbase org.apache.hadoop.hbase.regionserver.HRegion hdfs://aa0-000-12:9002/hbasetrunk2/-ROOT-
- Don't forget the 'major_compact' off the end else it just lists out the content of the region.
- I had to copy the hadoop-site.xml to a location where it would be picked up by the above
script -- e.g. from my hadoop 0.19 install to my {{{$HBASE_HOME/conf}}} -- so the above script
could find the right HDFS otherwise it was going against local filesystem.
- ===== Can you back up your data? =====
- Migration has been tested but if you have sufficient space in hdfs to make a copy of your
hbase rootdir, do so.  Just in case.  Use hdfs distcp.
- ==== Migrating ====
- Migrate hadoop. Refer to the [http://wiki.apache.org/hadoop/Hadoop%20Upgrade Hadoop Upgrade]
- Migrate HBase.  The bulk of the time involved migration is the rewriting of the hbase storefiles
from their 0.19 format into the new 0.20 format.  Each rewrite takes about 6-10 seconds. 
In the filesystem, count roughly how many regions you have (or get it off the UI).  Multiple
regions * 10 seconds.  If the migration will take longer than you are prepared to wait, there
is a mapreduce job to do the file convertions only:
- {{{$./bin/hadoop jar hbase.jar hsf2sf}}}
- This job takes an empty input and output directory.  It will first run through your filesystem
to find all mapfile to convert, write a file to the input directory and then startup the mapreduce
job to do the convertions.
- Now, run the hbase migration script.  If you have run the mapreduce job, it will notice
that all storefiles have been rewritten and will skip the rewrite step.  Otherwise, the migration
script first does this.
- {{{$./bin/hbase migrate upgrade}}}
- ==== Post-Migration ====
- Make sure you replace all under {{{$HBASE_HOME/conf}}} with files from the new release.
 For example, be sure to replace your old hbase-default.xml with the version from the new
hbase release.
- Read the new 'Getting Started' carefully before starting up your cluster.  Basic configuration
properties have changed.  For example {{{hbase.master}}}/{{{hbase.master.hostname}}} is no
longer used.  They are replaced by {{{hbase.cluster.distributed}}}.  See the 'Getting Started'
for detail on how to set the new properties.  While your cluster will likely come up on the
old configuration settings, you should move to the new configuration.
- == From 0.1.x to 0.2.x or 0.18.x ==
- The following are step-by-step instructions for migrating from HBase 0.1 to 0.2 or 0.18.
 Migration from 0.1 to 0.2 requires an upgrade from Hadoop 0.16 to 0.17, and migration from
0.1 to 0.18 requires an upgrade from Hadoop 0.16 to 0.18. The [http://wiki.apache.org/hadoop/Hadoop%20Upgrade
Hadoop Upgrade Instructions] are slightly out-of-date (as of this writing, September 2008).
 As such, the below instructions also clarify the necessary steps for upgrading Hadoop.
- Assume Hadoop 0.16 and HBase 0.1 are already running with data you wish migrate to HBase
-  * Stop HBase 0.1.
-  * From the [http://wiki.apache.org/hadoop/Hadoop%20Upgrade Hadoop Upgrade Instructions],
perform steps 1-4 and 9-10 (and optionally 5-8, 11-12) on your instance of Hadoop 0.16.
-  * Run {{{{$HADOOP_HOME_0.17}/bin/start-dfs.sh -upgrade}}}
-  * Perform Hadoop upgrade steps 16-19 on your instance of Hadoop 0.17.
-  * Run {{{{$HADOOP_HOME_0.17}/bin/hadoop dfsadmin -finalizeUpgrade}}}
-  * Download and configure HBase 0.2.  Make sure ''hbase.rootdir'' is configured to be the
same as it was in HBase 0.1.
-  * Run {{{{$HBASE_HOME_0.2}/bin/hbase migrate upgrade}}}
-  * Start HBase 0.2.
- As you will notice, the [http://wiki.apache.org/hadoop/Hadoop%20Upgrade Hadoop Upgrade Instructions]
(specifically steps 2-4, 16-18) ask you to generate several logs to compare and ensure that
the upgrade ran correctly.  I did notice some inconsistency in my logs between ''dfs-v-old-report-1.log''
and ''dfs-v-new-report-1.log''; specifically the ''Total effective bytes'' and ''Effective
replication multiplier'' fields did not match (in the new log, the values reported were zero
and infinity, respectively).  Additionally, ''dfs-v-new-report-1.log'' claimed that the update
was not finalized.  Running {{{{$HADOOP_HOME}/bin/hadoop dfsadmin -finalizeUpgrade}}} resolves
the second issue, finalizing the upgrade as expected.  I could not find a way to resolve the
inconsistencies with the ''Total effective bytes'' and ''Effective replication multiplier''
fields.  Nonetheless, I found no problems with the migration and the data appeared to be completely
- The API in 0.2 is not backward-compatible with hbase 0.1 versions.  See [http://wiki.apache.org/hadoop/Hbase/Plan-0.2/APIChanges
API Changes] for discussion of the main differences.

View raw message