hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thibaut_ <tbr...@blue.lu>
Subject Table with 80 regions having nearly no data in it
Date Wed, 17 Dec 2008 18:31:39 GMT


I upgraded to the trunk version (0.19.0-dev, r726278) of hbase and I'm
pretty happy with it.

Also the lockups I had before, seemed to go away by not creating a new
instance of HTable on each update/delete/exists call. I also batched up
multiple writes to the tables (1 commit each 1000 rows), which seemed to
further reduce the load on the servers. My jobs finally run to an end,
pretty good :-)
As a side note, it would be helpfull if it would be possible not only to
insert or change rows with BatchUpdate, but also to delete rows (So I can
delete more rows at the end when I'm executing the other batched requests as

But something I have noticed is that I have a table (at least one)

tobeprocessed 	{NAME => 'tobeprocessed', IS_ROOT => 'false', IS_META =>
'false', FAMILIES => [{NAME => 'data', BLOOMFILTER => 'false', COMPRESSION
=> 'NONE', VERSIONS => '1', LENGTH => '2147483647', TTL => '-1', IN_MEMORY
=> 'false', BLOCKCACHE => 'false'}], INDEXES => []}

which spans over 70 regions, but only has about 117 rows in it (just a few
MByte). These entries are all in the last region (as I used a timestamp as
key and I just checked with a mapreduce job). On the webstatus page, there
are also 2 regions with an empty end key which seems very strange. One at
the end and one near to the middle. When I ran a mapreduce job over this
table, the region split startkey  is set however to the startkey of the next
region. (for the first region with an empty end key in the web interface)
(As a side node, stopping hbase took very long sometimes so I manually
killed the processes a few times before, which could have led to this...)

Shouldn't the regions be deleted when no data is present? (as I have set
versions to 1 and deleted the keys through HTable.deleteAll() function). I
manually run compactation through the webinterface without specifying any
key, but this didn't remove the regions (the compactations were executed on
the regionservers (as per log) and took 0 seconds to complete as there was
no data in them)

Also the startup phase is a lot longer than in hadoop 0.18.1. I have about
1500 regions over 7 servers, and it can take up to 5 minutes until all
regions are loaded. (Hbase doesn't even start to load regions, only when I
make a first request to it). But this could also be related to corrupt

Hbase settings:
    <description>HMaster server lease period in milliseconds. Default is
    120 seconds.  Region servers must report in within this period else
    they are considered dead.  On loaded cluster, may need to up this
    <description>HRegion server lease period in milliseconds. Default is
    60 seconds. Clients must report in within this period else they are
    considered dead.</description>
    If more than this number of HStoreFiles in any one HStore
    (one HStoreFile is written per flush of memcache) then a compaction
    is run to rewrite all HStoreFiles files as one.  Larger numbers
    put off compaction but when it runs, it takes longer to complete.
    During a compaction, updates cannot be flushed to disk.  Long
    compactions require memory sufficient to carry the logging of
    all updates across the duration of the compaction.
    If too large, clients timeout during compaction.
    <description>Max number of HStoreFiles to compact per 'minor'

Thanks for your help,
View this message in context: http://www.nabble.com/Table-with-80-regions-having-nearly-no-data-in-it-tp21058687p21058687.html
Sent from the HBase User mailing list archive at Nabble.com.

View raw message