hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3282) Need to retain DeadServers to ensure we don't allow previously expired RS instances to rejoin cluster
Date Mon, 29 Nov 2010 19:05:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964879#action_12964879

HBase Review Board commented on HBASE-3282:

Message from: "Jonathan Gray" <jgray@apache.org>

This is an automatically generated e-mail. To reply, visit:

Review request for hbase and stack.


We currently let go of dead servers once we finish their shutdown.  We should hang on to them
longer to deal with things like network partitions.

I'm not a fan of SoftReferences so I decided on another approach.  DeadServers now has a maximum
number of servers to hold on to in the set (default 100).  Once it reaches the max, it evicts
the oldest.

More code than I had hoped but nothing too crazy.

This addresses bug HBASE-3282.


  branches/0.90/src/main/java/org/apache/hadoop/hbase/master/DeadServer.java 1040221 
  branches/0.90/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1040221 
  branches/0.90/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 1040221 

Diff: http://review.cloudera.org/r/1259/diff


Running unit tests now.



> Need to retain DeadServers to ensure we don't allow previously expired RS instances to
rejoin cluster
> -----------------------------------------------------------------------------------------------------
>                 Key: HBASE-3282
>                 URL: https://issues.apache.org/jira/browse/HBASE-3282
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.90.0, 0.92.0
> Currently we clear a server from the deadserver set once we finish processing it's shutdown.
 However, certain circumstances (network partitions, race conditions) could lead to the RS
not doing a check-in until after the shutdown has been processed.  As-is, this RS will now
be let back in to the cluster rather than rejected with YouAreDeadException.
> We should hang on to the dead servers so we always reject them.
> One concern is that the set will grow indefinitely.  One recommendation by stack is to
use SoftReferences.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message