hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Stack (Jira)" <j...@apache.org>
Subject [jira] [Created] (HBASE-23282) HBCKServerCrashProcedure for 'Unknown Servers'
Date Wed, 13 Nov 2019 00:51:00 GMT
Michael Stack created HBASE-23282:
-------------------------------------

             Summary: HBCKServerCrashProcedure for 'Unknown Servers'
                 Key: HBASE-23282
                 URL: https://issues.apache.org/jira/browse/HBASE-23282
             Project: HBase
          Issue Type: Bug
          Components: hbck2, proc-v2
    Affects Versions: 2.2.2
            Reporter: Michael Stack


With an overdriving, sustained load, I can fairly easily manufacture an hbase:meta table that
references servers that are no longer in the live list nor are members of deadservers; i.e.
'Unknown Servers'.  The new 'HBCK Report' UI in Master has a section where it lists 'Unknown
Servers' if any in hbase:meta.

Once in this state, the repair is awkward. Our assign/unassign Procedure is particularly dogged
about insisting that we confirm close/open of Regions when it is going about its business
which is well and good if server is in live/dead sets but when an 'Unknown Server', we invariably
end up trying to confirm against a non-longer present server (More on this in follow-on issues).

What is wanted is queuing of a ServerCrashProcedure for each 'Unknown Server'. It would split
any WALs (there shouldn't be any if server was restarted) and ideally it would cancel out
any assigns and reassign regions off the 'Unknown Server'.  But the 'normal' SCP consults
the in-memory cluster state figuring what Regions were on the crashed server... And 'Unknown
Servers' don't have state in in-master memory Maps of Servers to Regions or  in DeadServers
list which works fine for the usual case.

Suggestion here is that hbck2 be able to drive in a special SCP, one which would get list
of Regions by scanning hbase:meta rather than asking Master memory; an HBCKSCP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message