hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12440) Region may remain offline on clean startup under certain race condition
Date Fri, 07 Nov 2014 23:20:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202916#comment-14202916

Andrew Purtell commented on HBASE-12440:

All o.a.h.h.master.** and o.a.h.h.regionserver.** tests pass on 0.98 and branch-1. TestAssignmentManagerOnCluster
passes 10 out of 10 times on 0.98 and branch-1.

Going to push this to both branches shortly

> Region may remain offline on clean startup under certain race condition
> -----------------------------------------------------------------------
>                 Key: HBASE-12440
>                 URL: https://issues.apache.org/jira/browse/HBASE-12440
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>            Reporter: Virag Kothari
>            Assignee: Virag Kothari
>             Fix For: 0.98.8, 0.99.1
>         Attachments: HBASE-12440-0.98.patch, HBASE-12440-0.98_v2.patch, HBASE-12440-branch-1.patch
> Saw this in prod some time back with zk assignment
> On clean startup, while master was doing bulk assign while one of the region servers
dies. The bulk assigner then tried to assign it individually using AssignCallable. The AssignCallable
does a forceStateToOffline() and skips assigning as it wants the SSH to do the assignment
> {code}
> 2014-10-16 16:05:23,593 DEBUG master.AssignmentManager [AM.-pool1-t1] : Offline sieve_main:inlinks,com.cbslocal.seattle/photo-galleries/category/consumer///:http\x09com.cbslocal.seattle/photo-galleries/category/tailgate-fan///:http,1413464068567.1f1620174d2542fe7d5b034f3311c3a8.,
no need to unassign since it's on a dead server: gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016
> 2014-10-16 16:05:23,593  INFO master.RegionStates [AM.-pool1-t1] : Transition {1f1620174d2542fe7d5b034f3311c3a8
state=PENDING_OPEN, ts=1413475519482, server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016}
to {1f1620174d2542fe7d5b034f3311c3a8 state=OFFLINE, ts=1413475523593, server=gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016}
> 2014-10-16 16:05:23,598  INFO master.AssignmentManager [AM.-pool1-t1] : Skip assigning
it is on a dead but not processed yet server: gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016
> {code}
> But the SSH wont assign as the region is offline but not in transition
> {code}
> 2014-10-16 16:05:24,606  INFO handler.ServerShutdownHandler [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0]
: Reassigning 0 region(s) that gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016 was carrying
(and 0 regions(s) that were opening on this server)
> 2014-10-16 16:05:24,606 DEBUG master.DeadServer [MASTER_SERVER_OPERATIONS-hbbl874n38:50510-0]
: Finished processing gsbl872n06.blue.ygrid.yahoo.com,50511,1413475494016
> {code}
> In zk-less assignment, the bulk assigner invoking AssignCallable and the SSH may try
to assign the region. But as they go through lock, only one will succeed and doesn't seem
to be an issue. 

This message was sent by Atlassian JIRA

View raw message