hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Table region got stuck, doesn't move/assign
Date Thu, 19 Jan 2012 22:15:44 GMT
thank you, Michael.

problem is solved (for now) by moving region out after restarting the
region server although we don't really know the reason why and what
happened to that region.

Region server got stuck on any requests to a particular region and
only that one. Master was ok as i realized later. Why it couldn't
immediately move the region, i am mot sure; but as soon as we
restarted the region server and switched table offline/online, it was
able to complete move /reassign the region.

The real problem was that it happened to one (apparently random)
region in a region server but not others. Symptoms were region server
hanging, not returning any scan requests to that region (but not
others). the condition persisted for a long time (several days) and we
did not figure it out until we caught several jobs of low importance
timing out on reading from the table containing that region. The table
experiences asychronous reads and regular write updates (it's actually
a part of HBL cube).

I think there's really low chance we'll ever get down to the bottom of
it, so we dropped any further triage attempts at this point. I guess
we just also need to upgrade our hbase stack in prod.

Thank you very much, sir.


On Wed, Jan 18, 2012 at 9:34 AM, Stack <stack@duboce.net> wrote:
> On Mon, Jan 16, 2012 at 3:45 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>> i have a table which seems to get stuck in a state where it can't be
>> queried, moved or split/compacted.
> How many regions in this table?  One only?
>> The logs don't have any error statements. Our admin tried hbck to no avail .
> What did your admin see?
>> We stopped the region server, table did not get reassigned. (all other
>> did). when bround in UI, this table just showed "region server
>> offline". (??? shouldn't get reassigned as others did?)
> Yes.  It should.
>> Brining region server online loaded it with other regions, but not
>> that table. master apparently still thinks it is on that node (data6)
>> and so all requests are failing with region not serving message.
> So, there is something 'wrong' w/ that table.   Can you track it in
> master log and see what happens when master tries assign it?  Maybe
> its failing to open?
>> assign/move/ unassign commands have no effect (move fails, but
>> assing/unassign seems to be quiet with no apparent effect).
>> Another weirdness: it's the only table that is showing up under
>> hbase/table in zk and its region is listed under /hbase/unassigned.
> Maybe its stuck in transition?  You should see messages in master log
> if this the case.
>> Where can i read about meaning and transitions of zookeeper nodes under /hbase ?
> I don't think this documented in the reference guide (its a little too
> much detail for most I'd say).  Best place to look is probably source
> code.  See here for an entrance into the wonderful world of
> master/regionserver state transitions:
> http://hbase.apache.org/xref/org/apache/hadoop/hbase/executor/EventHandler.html#93
> St.Ack

View raw message