hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jimmy Xiang <jxi...@cloudera.com>
Subject Re: assignment - is master beeing a watchdog useful?
Date Thu, 06 Dec 2012 18:35:57 GMT
We can't make the assignment znode ephemeral.  It is used to track
region assignments, and recovery.  For example, if a region is moving
from rs A to rs B, while it is opening on
B and B and the master die.  If the znode is gone with B, then the new
backup master will think the region is still open on rs A since A is
live and meta still shows the region is on A, which is not the case.


On Thu, Dec 6, 2012 at 10:18 AM, Sergey Shelukhin
<sergey@hortonworks.com> wrote:
> I may be missing some past context here, but why not make it so that the
> assignment zookeeper node is ephemeral, so it dies with the server?
> Then it will be possible to notice there's no more assignment without the
> separate watcher.
> I have conflicting opinions about the current safeguard; on one hand, I've
> seen at least one bug (HBASE-6060) that was fixed (on 0.96 but explicitly
> not in 0.94) that resulted in region never being assigned (until the 30min
> watcher kicked in, that is).
> On the other hand, making catch-alls for code bugs in this manner seems
> like a bad practice.
> Maybe we can remove it when we have "bulletproof" unit(!) tests for AM that
> take into account various scenarios.
> On Thu, Dec 6, 2012 at 9:26 AM, Jimmy Xiang <jxiang@cloudera.com> wrote:
>> Currently, rs doesn't watch the znode.  RS cancels ongoing open after
>> master tells it so.
>> Jimmy
>> On Wed, Dec 5, 2012 at 7:53 PM, Stack <stack@duboce.net> wrote:
>> > On Wed, Dec 5, 2012 at 6:57 PM, Jimmy Xiang <jxiang@cloudera.com> wrote:
>> >
>> >> If this region server happens to be hot, it may take a while to open
>> >> it.  If we don't time it out, the server may be even hotter.  If the
>> >> region server could not open it here, other region servers may not be
>> >> able to open it either.
>> >>
>> >
>> >
>> > I suppose the master can still 'timeout' the open if the RS is watching
>> the
>> > znode for the region it is trying to open.  The RS will notice that
>> master
>> > has assumed control in a callback and can then cancel any ongoing open.
>> >
>> > St.Ack

View raw message