incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang <>
Subject messages stopped for 3 minutes?
Date Sun, 25 Sep 2011 06:19:48 GMT
I constantly see TimedOutException , then followed by
UnavailableException in my logs,
so I added some extra debugging to Gossiper. notifyFailureDetector()

    void notifyFailureDetector(InetAddress endpoint, EndpointState
        IFailureDetector fd = FailureDetector.instance;
        EndpointState localEndpointState = endpointStateMap.get(endpoint);
        logger.debug("notify failure detector");
         * If the local endpoint state exists then report to the FD only
         * if the versions workout.
        if ( localEndpointState != null )
                logger.debug("notify failure detector, endpoint");
            int localGeneration =
            int remoteGeneration =
            if ( remoteGeneration > localGeneration )
                logger.debug("notify failure detector --- report 1");

then I found that this method stopped being called for a period of 3
minutes, so of course the detector considers the other side to be

but since these 2 boxes are in the same EC2 region, same security
group, there is no reason there is a network issue that long. so I
ran a background job that just does

echo | nc $the_other_box 7000   in a loop

and this always works fine, without failing to contact the 7000 port.

so somehow the messages were not delivered or received, how could I debug this?
(extra logging attached)


View raw message