hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 伍照坤 <tonywu...@gmail.com>
Subject load hbase data directly from hfile with pre-split regions
Date Fri, 26 Jun 2015 17:00:35 GMT
hi, guys,

May i load hfiles by pre-split regions when copying table from other
cluster to a new cluster?

here is what i do at now:
1. distcp hfile folder from source cluster to destination cluster.
2. grep the regions from the hbase table folder, and use it to pre-split
table in destination cluster, i expect to avoid the hfile splitting.
3. use hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles to
load regions files to tables.

for example,

i got a table hfiles on hdfs from source cluster like this:
drwxr-xr-x   - hbase hbase          0 2015-06-25 22:32
/hbase/data/itemtest/Table1/.tabledesc
drwxr-xr-x   - hbase hbase          0 2015-06-25 22:32
/hbase/data/itemtest/Table1/.tmp
drwxr-xr-x   - hbase hbase          0 2015-06-25 22:32
/hbase/data/itemtest/Table1/11111111111111111111111111111111
drwxr-xr-x   - hbase hbase          0 2015-06-25 22:32
/hbase/data/itemtest/Table1/22222222222222222222222222222222
drwxr-xr-x   - hbase hbase          0 2015-06-25 22:32
/hbase/data/itemtest/Table1/33333333333333333333333333333333
drwxr-xr-x   - hbase hbase          0 2015-06-25 22:32
/hbase/data/itemtest/Table1/44444444444444444444444444444444


then i want to create a pre-split table with these regions
['11111111111111111111111111111111','22222222222222222222222222222222','33333333333333333333333333333333','44444444444444444444444444444444'],

so in hbase shell, do this:

hbase(main):001:0> create 'itemtest:Table1', {NAME => 'BaseInfo',
DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER =>'NONE', REPLICATION_SCOPE =>
'0', COMPRESSION => 'SNAPPY', VERSIONS => '1', TTL => '2147483647',
MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536',
IN_MEMORY => 'false'}, {MAX_FILESIZE => 10737418240}, {SPLITS =>
['11111111111111111111111111111111','22222222222222222222222222222222','33333333333333333333333333333333','44444444444444444444444444444444']}
0 row(s) in 0.8080 seconds


but when i enter the hdfs path, i find the region folder name is same as
source cluster.
[tw79@e3ecmrhdp03 ~]$ hadoop fs -ls /hbase/data/itemtest/Table1
Found 7 items
drwxr-xr-x   - hbase hbase          0 2015-06-26 09:41
/hbase/data/itemtest/Table1/.tabledesc
drwxr-xr-x   - hbase hbase          0 2015-06-26 09:41
/hbase/data/itemtest/Table1/.tmp
drwxr-xr-x   - hbase hbase          0 2015-06-26 09:41
/hbase/data/itemtest/Table1/09f1d9847762b45c5f095bb9b5dad986
drwxr-xr-x   - hbase hbase          0 2015-06-26 09:41
/hbase/data/itemtest/Table1/0df1dbcc531b451504238c21ec1c06b9
drwxr-xr-x   - hbase hbase          0 2015-06-26 09:41
/hbase/data/itemtest/Table1/285be1d39434ab6599f3966537229db5
drwxr-xr-x   - hbase hbase          0 2015-06-26 09:41
/hbase/data/itemtest/Table1/2b8944234f41ad29a5722f87ac954d2c
drwxr-xr-x   - hbase hbase          0 2015-06-26 09:41
/hbase/data/itemtest/Table1/e337e82c720a8559fdabd611e1ee21b3


any suggestion on this?

Thanks.

tonywutao

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message