hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Austin Heyne <ahe...@ccri.com>
Subject Bulk Load running files multiple times
Date Thu, 28 Jun 2018 20:23:59 GMT
Hey again,

I'm running a bulk load on s3 and I'm seeing region servers being 
instructed to load an hfile multiple times. I've seen this behavior two 
different ways. The first time I saw this was after deciding to brute 
force my way around the problem in HBASE-20774 [1] by just letting the 
cluster copy the data down from s3 and back up. After letting this run 
over night the input 15TB of data was now over 60TB in the hbase 
catalog. I noticed in the logs that files were being copied multiple 
times (I saw around 10 times on some files on a single region server).

After writing a patch for HBASE-20774 and deploying it to the cluster on 
the next attempt I'm seeing the same behavior but because when you bulk 
load from the same FS as the HBase root, HBase renames files rather than 
copying them. This is causing FileNotFoundExceptions the next time the 
region server is instructed to load the file. I've cherry picked the 
logs of an occurrence of this happening off of a region server:

2018-06-27 19:34:18,170 INFO 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] 
regionserver.HStore: Validating hfile at 
s3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe for 
inclusion in store d region z3_v2,\x00\x09\xD9 
\x00\x00\x00\x00\x00\x00\x00,1530122295326.f6e3567551881fbd48baed14bfe2252b.
2018-06-27 19:34:18,236 INFO 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] 
s3n.S3NativeFileSystem: Opening 
's3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe' 
for reading
2018-06-27 19:34:45,915 INFO 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] 
s3n.S3NativeFileSystem: rename 
s3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe 
s3://bucket_name/data/default/z3_v2/f6e3567551881fbd48baed14bfe2252b/d/47aef50a986340908a8242b256adbedf_SeqId_4_


2018-06-27 19:34:59,005 INFO 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] 
regionserver.HStore: Loaded HFile 
s3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe 
into store 'd' as 
s3://bucket_name/data/default/z3_v2/f6e3567551881fbd48baed14bfe2252b/d/47aef50a986340908a8242b256adbedf_SeqId_4_

- updating store file list.
2018-06-27 19:34:59,101 INFO 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] 
regionserver.HStore: Successfully loaded store file 
s3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe 
into store d (new location: 
s3://bucket_name/data/default/z3_v2/f6e3567551881fbd48baed14bfe2252b/d/47aef50a986340908a8242b256adbedf_SeqId_4_)
2018-06-27 19:40:03,463 INFO 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] 
regionserver.HStore: Validating hfile at 
s3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe for 
inclusion in store d region z3_v2,\x00\x09\xD9 
\x00\x00\x00\x00\x00\x00\x00,1530122295326.f6e3567551881fbd48baed14bfe2252b.
org.apache.hadoop.io.MultipleIOException: 18 exceptions 
[java.io.FileNotFoundException: No such file or directory 
's3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe', 
....]
at 
org.apache.hadoop.io.MultipleIOException.createIOException(MultipleIOException.java:53)
at 
org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:5782)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:2170)


at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36615)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2354)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)

I'm not sure where the problem is coming from here, if the master is 
adding files to the queue multiple times or just not removing them. If 
anyone has knowledge on this or have an idea where things are going 
wrong it would be very helpful since I'm not super familiar with the 
code base.

Thanks,
Austin

[1] https://issues.apache.org/jira/browse/HBASE-20774


Mime
View raw message