hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Table with 80 regions having nearly no data in it
Date Fri, 19 Dec 2008 02:23:37 GMT
Thibaut_ wrote:
> Hello St.Ack,
> thanks for your answer.
> I will vote on HBASE-880, that's exactly what I needed :)
> Except of the little glitch in the webinterface regarding the ending key,
> that table seems to be ok then. I thought regions would get deleted. I'm
> adding new data (Timestamp of current time is the key), and when I have
> processed that data, I'm deleting the data. That's the reason why I get more
> and more regions... (because as soon as I add data that spans over two
> regions, the first region won't be removed when I have deleted that data).

We'd need a cluster cleaner process that would notice empty regions and 
would merge them into adjacents.  That seems reasonable to me (Sounds 
like HBASE-420).

The below is a bit hard to read but from what I can make out, its not 
right.  The ENDKEY of a region should be the STARTKEY of the next.  The 
ENDKEY of '', should be last in table.  Neither seems to be going on 
here.   Perhaps make an issue and paste a clean output from .META. and 
I'll take a look.
> Here is the meta info of that region (NAME =>
> 'tobeprocessed,12293840182411045696639,1229385024829') and the regions
> around that region
> .........
>  tobeprocessed,1229383104274 column=info:regioninfo,
> timestamp=1229384271404, value=REGION => {NAME => 'tobepr
>  796785789,1229384269280     ocessed,1229383104274796785789,1229384269280',
> STARTKEY => '122938310427479678578
>                              9', ENDKEY => '12293837601871695303679',
> ENCODED => 1839260151, TABLE => {{NAME =
>                              > 'tobeprocessed', IS_ROOT => 'false', IS_META
> => 'false', FAMILIES => [{NAME =>
>                              'data', BLOOMFILTER => 'false', COMPRESSION =>
> 'NONE', VERSIONS => '1', LENGTH =>
>                               '2147483647', TTL => '-1', IN_MEMORY =>
> 'false', BLOCKCACHE => 'false'}], INDEXE
>                              S => []}}
>  tobeprocessed,1229383760187 column=info:regioninfo,

> But I have another table (rsssources), which when I scanned it yesterday had
>> 400 000 entries (count 'rsssources'), and today only has 180 000 entries
> (after killing hbase (kill -9), because it was unresponsive. 

Well, the above listing would seem to have missing regions.  Lets figure 
whats gong on.  Make an issue 'missing regions' and paste in your scan 
'.META.' output.  Do you think you can reproduce this?

> I'm also
> logging the DEBUG entries now to see what's happening at that point).

Good.  Can you see anything about the regions that go missing?

> When I execute a mapreduce job, a few regions don't seem to have any data in
> them. I did however never delete any data in that table (just replaces). 
> I did increase the timeout values, because I read somewhere else that it
> would help in some cases. But I will reset the values to their original
> values.
> What's the best way to stop hbase when the hbase-stop script doesn't work.
> (Sometimes it just runs for hours... (probably a deadlock somewhere)?
Tail the master log with DEBUG enabled.  Should point at what is taking 
so long to go down.  May indicate a particular regionserver.  Thread 
dump it ("kill -QUIT PID").  Stick that in the issue to.

> I'm waiting now for hbase to shut down, and will try to run the merge script
> on those two tables.
Good stuff,

View raw message