hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Dimiduk (Jira)" <j...@apache.org>
Subject [jira] [Created] (HBASE-24293) Assignment manager should never give up assigning meta
Date Thu, 30 Apr 2020 18:22:00 GMT
Nick Dimiduk created HBASE-24293:
------------------------------------

             Summary: Assignment manager should never give up assigning meta
                 Key: HBASE-24293
                 URL: https://issues.apache.org/jira/browse/HBASE-24293
             Project: HBase
          Issue Type: Bug
          Components: master, Region Assignment
    Affects Versions: 2.3.0
            Reporter: Nick Dimiduk


Not yet sure how we got here, but,

{noformat}
2020-04-29 22:39:16,140 INFO org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure:
pid=308, state=RUNNABLE:SERVER_CRASH_ASSIGN_META, locked=true; ServerCrashProcedure server=
host-a.example.com,16020,1588033841562, splitWal=true, meta=true found a region state=OFFLINE,
location=null, table=hbase:meta, region=1588230740 which is no longer on us host-a.example.com,16020,1588033841562,
give up assigning...
{noformat}

Assignment manager gives up on this procedure and nothing can progress. Manual intervention
is necessary.

>From this [conditional block|https://github.com/apache/hbase/blob/1415a82d41a1e125440014a4b23364371b30d065/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L475],
it seems the {{regionNode}} location is {{null}}.

{noformat}
        // This is possible, as when a server is dead, TRSP will fail to schedule a RemoteProcedure
        // to us and then try to assign the region to a new RS. And before it has updated
the region
        // location to the new RS, we may have already called the am.getRegionsOnServer so
we will
        // consider the region is still on us. And then before we arrive here, the TRSP could
have
        // updated the region location, or even finished itself, so the region is no longer
on us
        // any more, we should not try to assign it again. Please see HBASE-23594 for more
details.
        if (!serverName.equals(regionNode.getRegionLocation())) {
          LOG.info("{} found a region {} which is no longer on us {}, give up assigning...",
this,
            regionNode, serverName);
          continue;
        }
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message