cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Shuler (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10052) Bringing one node down, makes the whole cluster go down for a second
Date Tue, 29 Sep 2015 21:53:04 GMT


Michael Shuler commented on CASSANDRA-10052:

[~sharvanath] when this JIRA ticket shows it has been completed, it will be closed as resolved,
usually with a commit comment. You could certainly build a custom jar with the patch, if you
need to - just check out the latest git release tag, patch, and build.

> Bringing one node down, makes the whole cluster go down for a second
> --------------------------------------------------------------------
>                 Key: CASSANDRA-10052
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sharvanath Pathak
>            Assignee: Stefania
>              Labels: client-impacting
>             Fix For: 2.1.x
> When a node goes down, the other nodes learn that through the gossip.
> And I do see the log from (
> {code}
> private void markDead(InetAddress addr, EndpointState localState)
>    {
>        if (logger.isTraceEnabled())
>            logger.trace("marking as down {}", addr);
>        localState.markDead();
>        liveEndpoints.remove(addr);
>        unreachableEndpoints.put(addr, System.nanoTime());
>"InetAddress {} is now DOWN", addr);
>        for (IEndpointStateChangeSubscriber subscriber : subscribers)
>            subscriber.onDead(addr, localState);
>        if (logger.isTraceEnabled())
>            logger.trace("Notified " + subscribers);
>    }
> {code}
> Saying: "InetAddress is now Down", in the Cassandra's system log.
> Now on all the other nodes the client side (java driver) says, " Cannot connect to any
host, scheduling retry in 1000 milliseconds". They eventually do reconnect but some queries
fail during this intermediate period.
> To me it seems like when the server pushes the nodeDown event, it call the getRpcAddress(endpoint),
and thus sends localhost as the argument in the nodeDown event.  
> As in
> {code}
>   public void onDown(InetAddress endpoint)
>        {      
>            server.connectionTracker.send(Event.StatusChange.nodeDown(getRpcAddress(endpoint),
>        }
> {code}
> the getRpcAddress returns localhost for any endpoint if the cassandra.yaml is using localhost
as the configuration for rpc_address (which by the way is the default).

This message was sent by Atlassian JIRA

View raw message