incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: messages stopped for 3 minutes?
Date Sun, 25 Sep 2011 13:33:15 GMT
What makes you think the problem is on the receiving node, rather than
the sending node?

On Sun, Sep 25, 2011 at 1:19 AM, Yang <teddyyyy123@gmail.com> wrote:
> I constantly see TimedOutException , then followed by
> UnavailableException in my logs,
> so I added some extra debugging to Gossiper. notifyFailureDetector()
>
>
>
>    void notifyFailureDetector(InetAddress endpoint, EndpointState
> remoteEndpointState)
>    {
>        IFailureDetector fd = FailureDetector.instance;
>        EndpointState localEndpointState = endpointStateMap.get(endpoint);
>        logger.debug("notify failure detector");
>        /*
>         * If the local endpoint state exists then report to the FD only
>         * if the versions workout.
>        */
>        if ( localEndpointState != null )
>        {
>                logger.debug("notify failure detector, endpoint");
>            int localGeneration =
> localEndpointState.getHeartBeatState().getGeneration();
>            int remoteGeneration =
> remoteEndpointState.getHeartBeatState().getGeneration();
>            if ( remoteGeneration > localGeneration )
>            {
>                localEndpointState.updateTimestamp();
>                logger.debug("notify failure detector --- report 1");
>                fd.report(endpoint);
>                return;
>            }
>
>
>
>
> then I found that this method stopped being called for a period of 3
> minutes, so of course the detector considers the other side to be
> dead.
>
> but since these 2 boxes are in the same EC2 region, same security
> group, there is no reason there is a network issue that long. so I
> ran a background job that just does
>
> echo | nc $the_other_box 7000   in a loop
>
> and this always works fine, without failing to contact the 7000 port.
>
>
> so somehow the messages were not delivered or received, how could I debug this?
> (extra logging attached)
>
> Thanks
> Yang
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message