incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang <teddyyyy...@gmail.com>
Subject Re: frequent node UP/Down?
Date Wed, 28 Sep 2011 02:33:51 GMT
it looks the new conns are created by the sending side
(OutboundTCPconnection.java), when it detects a IOException on
write(),

since these timeouts happen rather frequently, about 10 -- 20 times
per hour, I wonder really it's due to network in EC2, and really would
like some ways to ascertain that ( like some logging in dmesg saying
"connection dropped " etc ) ------ ahhh, maybe I need an extensive
tcpdump analysis session , which is a big pain.




On Tue, Sep 27, 2011 at 7:22 PM, Yang <teddyyyy123@gmail.com> wrote:
> found the reason.
>
> the IncomingTCPConnection.run() hit an exception and the thread
> terminated. the next incarnation of the thread did not come up until
> 20 seconds later, which caused the TimedOutException and
> UNavalableException to clients.
>
>
>
>  WARN [Thread-28] 2011-09-28 02:17:57,561 IncomingTcpConnection.java
> (line 122) eof reading from socket; closing
> java.io.EOFException
>        at java.io.DataInputStream.readInt(DataInputStream.java:392)
>        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:112)
>
>
>
> I don't know whether the EOF here is really due to network or something in code
> (if it's really network, is there a way to let IncomingTCPConnection
> fire up the next one faster? like within 1 second.... I'm reading
> through the code to find it )
>
> Thanks
> Yang
>
>
>
> On Sun, Sep 25, 2011 at 1:04 PM, Brandon Williams <driftx@gmail.com> wrote:
>> On Sun, Sep 25, 2011 at 1:10 PM, Yang <teddyyyy123@gmail.com> wrote:
>>> Thanks Brandon.
>>>
>>> I'll try this.
>>>
>>> but you can also see my later post regarding message drop :
>>> http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%3CCAAnh3_8AeHidYH9ybt82_EMH3LikbCDseNRak3JHfzaJ2L+9zQ@mail.gmail.com%3E
>>>
>>> that seems to show something in either code or background load causing
>>> messages to be really dropped
>>
>> I see.  My guess is then this: there is a local clock problem, causing
>> generations to be the same, thus not notifying the FD.  So perhaps the
>> problem is not network-related, but it is something in the ec2
>> environment.
>>
>> -Brandon
>>
>

Mime
View raw message