Mailing-List: contact cassandra-commits-help@incubator.apache.org;
 run by ezmlm
Precedence: bulk
Reply-To: cassandra-dev@incubator.apache.org
Message-ID: <2071045134.1261051338571.JavaMail.jira@brutus>
Date: Thu, 17 Dec 2009 12:02:18 +0000 (UTC)
From: "Jaakko Laine (JIRA)" <jira@apache.org>
To: cassandra-commits@incubator.apache.org
Subject: [jira] Updated: (CASSANDRA-634) Hinted Handoff Exception
In-Reply-To: <1176441786.1260898578065.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/CASSANDRA-634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jaakko Laine updated CASSANDRA-634:
-----------------------------------

    Attachment: 634-1st-part-gossip-about-all-nodes.patch

The way node's alive/dead status gets determined currently is as follows:

1st time there is gossip about certain node:
- add new node to endpoint state and mark it alive (onAlive will be called)
- call onJoin

2nd and subsequent gossip about this node
- notify failuredetector whenever there is gossip about this endpoint -> failuredetector starts to monitor this node and set node's status dead if needed (it will not set it to alive)
- node is marked alive whenever there is gossip about it

The important things here are: (1) node is assumed to be alive when 1st info about it arrives and (2) failuredetector does not know anything about the node before 2nd gossip. That means we cannot simply start gossiping info about dead nodes, as their status would remain "alive" forever (that is, until the dead node comes online and activates failure detector)

Proposed fix (patch attached):

1st time gossip:
- add new node to endpoint state, but set its status as dead
- call onJoin (token metadata will be updated)

2nd and subsequent gossips:
- Unchanged. This 2nd gossip will trigger markAlive (and call onAlive) and activate failuredetector -> normal situation

In short: assume node to be dead unless otherwise proven by subsequent gossip. If the node is alive, it will be marked so within seconds. If it is dead, we have knowledge about its existence, but we consider it (correctly) to be dead.

There is a possibility of false "alive" interpretation, though: Cluster has nodes A, B and C. Suppose C has just gossiped to B and dies. At this time C's status in A is different (older) than in B. Now suppose at this instant node D enters the cluster and first gossips with A. In this case D will get the old gossip and only later the new one. This second newer gossip will cause C to be marked alive even though it is already dead. However, since second gossip will also activate failure detector, C will be correctly marked as dead in a few seconds, so this is probably OK (and anyway a very rare occurence).

Two open issues:
- Now that we're gossiping about dead nodes as well, gossip digest continues to grow without boundary when nodes come and go. This information will never disappear as it will be propagated to new nodes no matter how old and obsolete it is. To counter this, we need some mechanism to (1) either remove dead node from endpointstateinfo or (2) at some point stop to gossip about it, or both.

For (1): when we get removetoken command, it is probably safe to remove the endpoint immediately (STATE_LEFT is broadcasted by different endpoint, so info about token removal will remain in the gossiper). Another thing we could do is to keep track of nodes that have left. If nothing is heard about it for some time, we could assume that it is gone for good and remove it from gossiper after giving its STATE_LEFT enough time to spread.

For (2): We could gossip info only about nodes in either liveEndpoints or unreachableEndpoints (as opposed to endPointStateMap). Nodes are removed from unreachableEndpoints after three days of silence, so this would discard old information from the gossiper. Side effect would of course be that a node that is down more than three days but comes back later, might miss some of its data (new nodes that booted after the three day period would know nothing about this node).

(Attached patch should work as such, but does not take into account these last two issues)


> Hinted Handoff Exception
> ------------------------
>
>                 Key: CASSANDRA-634
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-634
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Chris Goffinet
>             Fix For: 0.5
>
>         Attachments: 634-1st-part-gossip-about-all-nodes.patch
>
>
> Updated to the latest codebase from cassandra-0.5 branch. All nodes booted up fine and then I start noticing this error:
> ERROR [HINTED-HANDOFF-POOL:1] 2009-12-14 22:05:34,191 CassandraDaemon.java (line 71) Fatal exception in thread Thread[HINTED-HANDOFF-POOL:1,5,main]
> java.lang.RuntimeException: java.lang.NullPointerException
>         at org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:253)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.NullPointerException
>         at org.apache.cassandra.gms.FailureDetector.isAlive(FailureDetector.java:146)
>         at org.apache.cassandra.db.HintedHandOffManager.sendMessage(HintedHandOffManager.java:106)
>         at org.apache.cassandra.db.HintedHandOffManager.deliverAllHints(HintedHandOffManager.java:177)
>         at org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:75)
>         at org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:249)
>         ... 3 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.