cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yangfeng <yea...@gmail.com>
Subject Re: tcp CLOSE_WAIT bug
Date Sun, 25 Apr 2010 10:55:49 GMT
I encountered the same problem! Hope to get some help.Tks.

2010/4/22 Ingram Chen <ingramchen@gmail.com>

> arh! That's right.
>
> I check OutboundTcpConnection and it only does closeSocket() after
> something went wrong. I will log more in OutboundTcpConnection to see what
> actually happens.
>
> Thank your help.
>
>
>
>
> On Thu, Apr 22, 2010 at 10:03, Jonathan Ellis <jbellis@gmail.com> wrote:
>
>> But those connections aren't supposed to ever terminate unless a node
>> dies or is partitioned.  So if we "fix" it by adding a socket.close I
>> worry that we're covering up something more important.
>>
>> On Wed, Apr 21, 2010 at 8:53 PM, Ingram Chen <ingramchen@gmail.com>
>> wrote:
>> > I agree your point. I patch the code and log more informations to find
>> out
>> > the real cause.
>> >
>> > Here is the code snip I think may be the cause:
>> >
>> > IncomingTcpConnection:
>> >
>> >     public void run()
>> >     {
>> >         while (true)
>> >         {
>> >             try
>> >             {
>> >                 MessagingService.validateMagic(input.readInt());
>> >                 int header = input.readInt();
>> >                 int type = MessagingService.getBits(header, 1, 2);
>> >                 boolean isStream = MessagingService.getBits(header, 3,
>> 1) ==
>> > 1;
>> >                 int version = MessagingService.getBits(header, 15, 8);
>> >
>> >                 if (isStream)
>> >                 {
>> >                     new
>> IncomingStreamReader(socket.getChannel()).read();
>> >                 }
>> >                 else
>> >                 {
>> >                     int size = input.readInt();
>> >                     byte[] contentBytes = new byte[size];
>> >                     input.readFully(contentBytes);
>> >
>> MessagingService.getDeserializationExecutor().submit(new
>> > MessageDeserializationTask(new ByteArrayInputStream(contentBytes)));
>> >                 }
>> >             }
>> >             catch (EOFException e)
>> >             {
>> >                 if (logger.isTraceEnabled())
>> >                     logger.trace("eof reading from socket; closing", e);
>> >                 break;
>> >             }
>> >             catch (IOException e)
>> >             {
>> >                 if (logger.isDebugEnabled())
>> >                     logger.debug("error reading from socket; closing",
>> e);
>> >                 break;
>> >             }
>> >         }
>> >     }
>> >
>> > In normal condition, while loop is terminated after input.readInt()
>> throw
>> > EOFException. but it quits without socket.close(). what I do is wrap
>> whole
>> > while block inside a try { ... } finally {socket.close();}
>> >
>> >
>> > On Thu, Apr 22, 2010 at 01:14, Jonathan Ellis <jbellis@gmail.com>
>> wrote:
>> >>
>> >> I'd like to get something besides "I'm seeing close wait but i have no
>> >> idea why" for a bug report, since most people aren't seeing that.
>> >>
>> >> On Tue, Apr 20, 2010 at 9:33 AM, Ingram Chen <ingramchen@gmail.com>
>> wrote:
>> >> > I trace IncomingStreamReader source and found that incoming socket
>> comes
>> >> > from MessagingService$SocketThread.
>> >> > but there is no close() call on either accepted socket or
>> socketChannel.
>> >> >
>> >> > Should I file a bug report ?
>> >> >
>> >> > On Tue, Apr 20, 2010 at 11:02, Ingram Chen <ingramchen@gmail.com>
>> wrote:
>> >> >>
>> >> >> this happened after several hours of operations and both nodes
are
>> >> >> started
>> >> >> at the same time (clean start without any data). so it might not
>> relate
>> >> >> to
>> >> >> Bootstrap.
>> >> >>
>> >> >> In system.log I do not see any logs like "xxx node dead" or
>> exceptions.
>> >> >> and both nodes in test are alive. they serve read/write well, too.
>> >> >> Below
>> >> >> four connections between nodes are keep healthy from time to time.
>> >> >>
>> >> >> tcp        0      0 ::ffff:192.168.2.87:7000
>> >> >> ::ffff:192.168.2.88:58447   ESTABLISHED
>> >> >> tcp        0      0 ::ffff:192.168.2.87:54986
>> >> >> ::ffff:192.168.2.88:7000    ESTABLISHED
>> >> >> tcp        0      0 ::ffff:192.168.2.87:59138
>> >> >> ::ffff:192.168.2.88:7000    ESTABLISHED
>> >> >> tcp        0      0 ::ffff:192.168.2.87:7000
>> >> >> ::ffff:192.168.2.88:39074   ESTABLISHED
>> >> >>
>> >> >> so connections end in CLOSE_WAIT should be newly created. (for
>> >> >> streaming
>> >> >> ?) This seems related to streaming issues we suffered recently:
>> >> >>
>> http://n2.nabble.com/busy-thread-on-IncomingStreamReader-td4908640.html
>> >> >>
>> >> >> I would like add some debug codes around opening and closing of
>> socket
>> >> >> to
>> >> >> find out what happend.
>> >> >>
>> >> >> Could you give me some hint, about what classes I should take look
?
>> >> >>
>> >> >>
>> >> >> On Tue, Apr 20, 2010 at 04:47, Jonathan Ellis <jbellis@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> Is this after doing a bootstrap or other streaming operation?
 Or
>> did
>> >> >>> a node go down?
>> >> >>>
>> >> >>> The internal sockets are supposed to remain open, otherwise.
>> >> >>>
>> >> >>> On Mon, Apr 19, 2010 at 10:56 AM, Ingram Chen <
>> ingramchen@gmail.com>
>> >> >>> wrote:
>> >> >>> > Thank your information.
>> >> >>> >
>> >> >>> > We do use connection pools with thrift client and ThriftAdress
is
>> on
>> >> >>> > port
>> >> >>> > 9160.
>> >> >>> >
>> >> >>> > Those problematic connections we found are all in port
7000,
>> which
>> >> >>> > is
>> >> >>> > internal communications port between
>> >> >>> > nodes. I guess this related to StreamingService.
>> >> >>> >
>> >> >>> > On Mon, Apr 19, 2010 at 23:46, Brandon Williams <
>> driftx@gmail.com>
>> >> >>> > wrote:
>> >> >>> >>
>> >> >>> >> On Mon, Apr 19, 2010 at 10:27 AM, Ingram Chen
>> >> >>> >> <ingramchen@gmail.com>
>> >> >>> >> wrote:
>> >> >>> >>>
>> >> >>> >>> Hi all,
>> >> >>> >>>
>> >> >>> >>>     We have observed several connections between
nodes in
>> >> >>> >>> CLOSE_WAIT
>> >> >>> >>> after several hours of operation:
>> >> >>> >>
>> >> >>> >> This is symptomatic of not pooling your client connections
>> >> >>> >> correctly.
>> >> >>> >>  Be
>> >> >>> >> sure you're using one connection per thread, not one
connection
>> per
>> >> >>> >> operation.
>> >> >>> >> -Brandon
>> >> >>> >
>> >> >>> >
>> >> >>> > --
>> >> >>> > Ingram Chen
>> >> >>> > online share order: http://dinbendon.net
>> >> >>> > blog: http://www.javaworld.com.tw/roller/page/ingramchen
>> >> >>> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Ingram Chen
>> >> >> online share order: http://dinbendon.net
>> >> >> blog: http://www.javaworld.com.tw/roller/page/ingramchen
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Ingram Chen
>> >> > online share order: http://dinbendon.net
>> >> > blog: http://www.javaworld.com.tw/roller/page/ingramchen
>> >> >
>> >
>> >
>> >
>> > --
>> > Ingram Chen
>> > online share order: http://dinbendon.net
>> > blog: http://www.javaworld.com.tw/roller/page/ingramchen
>> >
>>
>
>
>
> --
> Ingram Chen
> online share order: http://dinbendon.net
> blog: http://www.javaworld.com.tw/roller/page/ingramchen
>

Mime
View raw message