hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Region not splitted
Date Fri, 09 Aug 2013 18:02:38 GMT

Quick question regarding the split.

Let's consider the table "work_proposed' below:

275164921921  hdfs://node3:9000/hbase/work_proposed

This is a 256GB table. I think there is more than 1B lines into it but I
have not counted them for a while.

This table as a pretty default definition:

hbase(main):001:0> describe 'work_proposed'

 'work_proposed', {NAME => '@', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER
TTL => '2147483647', MIN

 _VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536',
ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'},
{NAME => 'a',

'3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647',

 BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true',

1 row(s) in 0.7590 seconds

Those are all default parameters. Which mean, the default FILE_SIZE value
is 10GB.

If I look into Hannibal, it's fine. I can see my table, the regions, the
red line at 10GB showing the max size before the split, etc. All the
regions are under this line.... except one!

hadoop@buldo:~/hadoop-1.0.3$ bin/hadoop fs -ls
Found 1 items
-rw-r--r--   3 hbase supergroup 22911054018 2013-08-03 20:57

This region is 21GB. And it doesn't want to split. The first thing you will
say is it's because I have one single 21GB row in this region, but I don't
think so. My rows are URLs. I will be surprised if I have a 21GB URL ;)

I triggered major_compact many times, I stopped/start the cluster many
times, nothing. I can most probably ask for a manual split and that will
work, but I want to take this oportunity to figure why it's not splitting,
if it should be, and if there is any defect behind that.

I have not found any exception in the logs. I just started another
major_compaction and will grep the region name from the logs, but any idea
why I'm facing that, and where in the code I should start to look at? I can
deploy customized code to show more logs if required. I still start to look
at the split policies...


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message