hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "qihuang.zheng"<qihuang.zh...@fraudmetrix.cn>
Subject Re:completebulkload not mv or rename but copy and split manyattempt times
Date Fri, 25 Dec 2015 09:49:26 GMT
you are right. previous I bulkload one folder for experiment which is realy fast. and next
time bulkload cause split takes longer.
I know why this happen: we have many txt file. and I launch each importtsv mr task for every
txt file.
the result of each mr task generated ordered key-range HFile. but all HFile in global not
ordered!


Our row key is md5, and total records has 100 billion, 6TB. and each original file size range
from 100MB to 100GB. 
that’s why I launch many mr task parallel. and that’s problem occurred!
Although I create pre-split region with `{NUMREGIONS = 16, SPLITALGO = 'HexStringSplit’}`


the way to figure out currently is just use only one MR importtsv job.
and bulkload will reduce global ordered HFile to satisify hbase’s key-range.
and I also modify pre-split key-range to 000-fff(totally 16*16*16=4096 regions)


But as you know, original txt file is too large, not only map task number is too large, but
also reduce task number large.
and this may also cause long time to finish.


Is there any way to store such huge data to hbase quickly?
I have also check cassandra and other kv store. But the first must step to read original large
txt file also too slow. 






tks, qihuang.zheng


原始邮件
发件人:WangYQwangyongqiang0617@163.com
收件人:useruser@hbase.apache.org
发送时间:2015年12月23日(周三) 16:52
主题:Re:completebulkload not mv or rename but copy and split manyattempt times


this is because the table region changes, not match with the regions when you get the HFiles
if the bulkload process is over, the files should be moved to hbase i think it is better to
delete all hfiles and dirs when the bulkload over. At 2015-12-23 16:35:10, "qihuang.zheng"
qihuang.zheng@fraudmetrix.cn wrote: I Have a HFile generate by importtsv, the file is really
large, from 100mb to 10G. I have changed hbase.hregion.max.filesize to 50GB(53687091200).
also specify src CanonicalServiceName same with hbase. hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
hdfs://tdhdfs/user/tongdun/id_hbase/1 data.md5_id2 HADOOP_CLASSPATH=`hbase classpath` hadoop
jar hbase-1.0.2/lib/hbase-server-1.0.2.jar completebulkload /user/tongdun/id_hbase/1 data.md5_id2
But both completebulkload and LoadIncrementalHFiles did't just mv/rename hfile expected. but
instead copy and split hfile happening, which take long time. the logSplit occured while grouping
HFiles, retry attempt XXXwill create child _tmp dir one by one level. 2015-12-23 15:52:04,909
INFO [LoadIncrementalHFiles-0] hfile.CacheConfig: CacheConfig:disabled 2015-12-23 15:52:05,006
INFO [LoadIncrementalHFiles-0] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/01114a58782b4c369819673e4b3678ae
first=f6eb30074a52ebb8c5f52ed1c85c2f0d last=f93061a29e9458fada2521ffe45ca385 2015-12-23 15:52:05,007
INFO [LoadIncrementalHFiles-0] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/01114a58782b4c369819673e4b3678ae
no longer fits inside a single region. Splitting... 2015-12-23 15:53:38,639 INFO [LoadIncrementalHFiles-0]
mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.bottom
and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.top 2015-12-23
15:53:39,173 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles,
retry attempt 1 with 2 files remaining to group or split 2015-12-23 15:53:39,186 INFO [LoadIncrementalHFiles-1]
mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.bottom
first=f6eb30074a52ebb8c5f52ed1c85c2f0d last=f733d2c504f22f71b191014d72e4d124 2015-12-23 15:53:39,188
INFO [LoadIncrementalHFiles-2] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.top
first=f733d2c6407f5758e860195b6d2c10c1 last=f93061a29e9458fada2521ffe45ca385 2015-12-23 15:53:39,189
INFO [LoadIncrementalHFiles-2] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.top
no longer fits inside a single region. Splitting... 2015-12-23 15:54:27,722 INFO [LoadIncrementalHFiles-2]
mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.bottom
and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.top
2015-12-23 15:54:28,557 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping
HFiles, retry attempt 2 with 2 files remaining to group or split 2015-12-23 15:54:28,568 INFO
[LoadIncrementalHFiles-4] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.bottom
first=f733d2c6407f5758e860195b6d2c10c1 last=f77c7d357a76ff92bb16ec1ef79f31fb 2015-12-23 15:54:28,568
INFO [LoadIncrementalHFiles-5] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.top
first=f77c7d3915c9a8b71c83c414aabd587d last=f93061a29e9458fada2521ffe45ca385 2015-12-23 15:54:28,568
INFO [LoadIncrementalHFiles-5] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.top
no longer fits inside a single region. Splitting... 2015-12-23 15:55:08,992 INFO [LoadIncrementalHFiles-5]
mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.bottom
and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.top
2015-12-23 15:55:09,424 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping
HFiles, retry attempt 3 with 2 files remaining to group or split 2015-12-23 15:55:09,431 INFO
[LoadIncrementalHFiles-7] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.bottom
first=f77c7d3915c9a8b71c83c414aabd587d last=f7c525a83ee19ea166414e972c5d5541 2015-12-23 15:55:09,433
INFO [LoadIncrementalHFiles-8] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.top
first=f7c525aa2ec661c1c0707b02d1c4b4b3 last=f93061a29e9458fada2521ffe45ca385 2015-12-23 15:55:09,433
INFO [LoadIncrementalHFiles-8] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.top
no longer fits inside a single region. Splitting... 2015-12-23 15:55:42,165 INFO [LoadIncrementalHFiles-8]
mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.bottom
and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.top
2015-12-23 15:55:42,490 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping
HFiles, retry attempt 4 with 2 files remaining to group or split 2015-12-23 15:55:42,498 INFO
[LoadIncrementalHFiles-10] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.bottom
first=f7c525aa2ec661c1c0707b02d1c4b4b3 last=f80dcce8a4a14be406ddd1bdebc2eda2 2015-12-23 15:55:42,502
INFO [LoadIncrementalHFiles-11] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.top
first=f80dccecf159d4999cb8e17446103d72 last=f93061a29e9458fada2521ffe45ca385 2015-12-23 15:55:42,502
INFO [LoadIncrementalHFiles-11] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.top
no longer fits inside a single region. Splitting... 2015-12-23 15:56:09,560 INFO [LoadIncrementalHFiles-11]
mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.bottom
and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.top
2015-12-23 15:56:09,933 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping
HFiles, retry attempt 5 with 2 files remaining to group or split 2015-12-23 15:56:09,942 INFO
[LoadIncrementalHFiles-13] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.bottom
first=f80dccecf159d4999cb8e17446103d72 last=f85673f473ead63c89e96c83b2058ca7 2015-12-23 15:56:09,943
INFO [LoadIncrementalHFiles-14] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.top
first=f85673fde3138dac07ce08881c9d0ccc last=f93061a29e9458fada2521ffe45ca385 2015-12-23 15:56:09,944
INFO [LoadIncrementalHFiles-14] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.top
no longer fits inside a single region. Splitting... 2015-12-23 15:56:30,890 INFO [LoadIncrementalHFiles-14]
mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.bottom
and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.top
2015-12-23 15:56:31,145 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping
HFiles, retry attempt 6 with 2 files remaining to group or split 2015-12-23 15:56:31,151 INFO
[LoadIncrementalHFiles-16] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.bottom
first=f85673fde3138dac07ce08881c9d0ccc last=f89f12a56b5af206188639f736877563 2015-12-23 15:56:31,151
INFO [LoadIncrementalHFiles-17] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.top
first=f89f12a59e4a9c9bcbb42d0504318e25 last=f93061a29e9458fada2521ffe45ca385 2015-12-23 15:56:31,151
INFO [LoadIncrementalHFiles-17] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.top
no longer fits inside a single region. Splitting... 2015-12-23 15:56:44,959 INFO [LoadIncrementalHFiles-17]
mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.bottom
and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.top
2015-12-23 15:56:46,826 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping
HFiles, retry attempt 7 with 2 files remaining to group or split 2015-12-23 15:56:46,832 INFO
[LoadIncrementalHFiles-19] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.bottom
first=f89f12a59e4a9c9bcbb42d0504318e25 last=f8e7bc423ca4799459898439bf0f68b2 2015-12-23 15:56:46,833
INFO [LoadIncrementalHFiles-20] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.top
first=f8e7bc4bc8c2e7eac7f7e31bc116f8e0 last=f93061a29e9458fada2521ffe45ca385 2015-12-23 15:56:46,930
INFO [main] client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
2015-12-23 15:56:46,931 INFO [main] client.ConnectionManager$HConnectionImplementation: Closing
zookeeper sessionid=0x3515d529acedbaa 2015-12-23 15:56:46,960 INFO [main] zookeeper.ZooKeeper:
Session: 0x3515d529acedbaa closed 2015-12-23 15:56:46,960 INFO [main-EventThread] zookeeper.ClientCnxn:
EventThread shut down even though the process finished, original hfile did't delete. I was
wondering why mv/rename command not happend. [qihuang.zheng@spark047213 ~]$ hadoop fs -du
-h /user/tongdun/id_hbase/1/id/ 3.3 G /user/tongdun/id_hbase/1/id/01114a58782b4c369819673e4b3678ae
6.0 G /user/tongdun/id_hbase/1/id/_tmp     tks, qihuang.zheng
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message