incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ingram Chen <ingramc...@gmail.com>
Subject Re: tcp CLOSE_WAIT bug
Date Thu, 22 Apr 2010 02:57:14 GMT
arh! That's right.

I check OutboundTcpConnection and it only does closeSocket() after something
went wrong. I will log more in OutboundTcpConnection to see what actually
happens.

Thank your help.



On Thu, Apr 22, 2010 at 10:03, Jonathan Ellis <jbellis@gmail.com> wrote:

> But those connections aren't supposed to ever terminate unless a node
> dies or is partitioned.  So if we "fix" it by adding a socket.close I
> worry that we're covering up something more important.
>
> On Wed, Apr 21, 2010 at 8:53 PM, Ingram Chen <ingramchen@gmail.com> wrote:
> > I agree your point. I patch the code and log more informations to find
> out
> > the real cause.
> >
> > Here is the code snip I think may be the cause:
> >
> > IncomingTcpConnection:
> >
> >     public void run()
> >     {
> >         while (true)
> >         {
> >             try
> >             {
> >                 MessagingService.validateMagic(input.readInt());
> >                 int header = input.readInt();
> >                 int type = MessagingService.getBits(header, 1, 2);
> >                 boolean isStream = MessagingService.getBits(header, 3, 1)
> ==
> > 1;
> >                 int version = MessagingService.getBits(header, 15, 8);
> >
> >                 if (isStream)
> >                 {
> >                     new IncomingStreamReader(socket.getChannel()).read();
> >                 }
> >                 else
> >                 {
> >                     int size = input.readInt();
> >                     byte[] contentBytes = new byte[size];
> >                     input.readFully(contentBytes);
> >
> MessagingService.getDeserializationExecutor().submit(new
> > MessageDeserializationTask(new ByteArrayInputStream(contentBytes)));
> >                 }
> >             }
> >             catch (EOFException e)
> >             {
> >                 if (logger.isTraceEnabled())
> >                     logger.trace("eof reading from socket; closing", e);
> >                 break;
> >             }
> >             catch (IOException e)
> >             {
> >                 if (logger.isDebugEnabled())
> >                     logger.debug("error reading from socket; closing",
> e);
> >                 break;
> >             }
> >         }
> >     }
> >
> > In normal condition, while loop is terminated after input.readInt() throw
> > EOFException. but it quits without socket.close(). what I do is wrap
> whole
> > while block inside a try { ... } finally {socket.close();}
> >
> >
> > On Thu, Apr 22, 2010 at 01:14, Jonathan Ellis <jbellis@gmail.com> wrote:
> >>
> >> I'd like to get something besides "I'm seeing close wait but i have no
> >> idea why" for a bug report, since most people aren't seeing that.
> >>
> >> On Tue, Apr 20, 2010 at 9:33 AM, Ingram Chen <ingramchen@gmail.com>
> wrote:
> >> > I trace IncomingStreamReader source and found that incoming socket
> comes
> >> > from MessagingService$SocketThread.
> >> > but there is no close() call on either accepted socket or
> socketChannel.
> >> >
> >> > Should I file a bug report ?
> >> >
> >> > On Tue, Apr 20, 2010 at 11:02, Ingram Chen <ingramchen@gmail.com>
> wrote:
> >> >>
> >> >> this happened after several hours of operations and both nodes are
> >> >> started
> >> >> at the same time (clean start without any data). so it might not
> relate
> >> >> to
> >> >> Bootstrap.
> >> >>
> >> >> In system.log I do not see any logs like "xxx node dead" or
> exceptions.
> >> >> and both nodes in test are alive. they serve read/write well, too.
> >> >> Below
> >> >> four connections between nodes are keep healthy from time to time.
> >> >>
> >> >> tcp        0      0 ::ffff:192.168.2.87:7000
> >> >> ::ffff:192.168.2.88:58447   ESTABLISHED
> >> >> tcp        0      0 ::ffff:192.168.2.87:54986
> >> >> ::ffff:192.168.2.88:7000    ESTABLISHED
> >> >> tcp        0      0 ::ffff:192.168.2.87:59138
> >> >> ::ffff:192.168.2.88:7000    ESTABLISHED
> >> >> tcp        0      0 ::ffff:192.168.2.87:7000
> >> >> ::ffff:192.168.2.88:39074   ESTABLISHED
> >> >>
> >> >> so connections end in CLOSE_WAIT should be newly created. (for
> >> >> streaming
> >> >> ?) This seems related to streaming issues we suffered recently:
> >> >>
> http://n2.nabble.com/busy-thread-on-IncomingStreamReader-td4908640.html
> >> >>
> >> >> I would like add some debug codes around opening and closing of
> socket
> >> >> to
> >> >> find out what happend.
> >> >>
> >> >> Could you give me some hint, about what classes I should take look
?
> >> >>
> >> >>
> >> >> On Tue, Apr 20, 2010 at 04:47, Jonathan Ellis <jbellis@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Is this after doing a bootstrap or other streaming operation? 
Or
> did
> >> >>> a node go down?
> >> >>>
> >> >>> The internal sockets are supposed to remain open, otherwise.
> >> >>>
> >> >>> On Mon, Apr 19, 2010 at 10:56 AM, Ingram Chen <ingramchen@gmail.com
> >
> >> >>> wrote:
> >> >>> > Thank your information.
> >> >>> >
> >> >>> > We do use connection pools with thrift client and ThriftAdress
is
> on
> >> >>> > port
> >> >>> > 9160.
> >> >>> >
> >> >>> > Those problematic connections we found are all in port 7000,
which
> >> >>> > is
> >> >>> > internal communications port between
> >> >>> > nodes. I guess this related to StreamingService.
> >> >>> >
> >> >>> > On Mon, Apr 19, 2010 at 23:46, Brandon Williams <driftx@gmail.com
> >
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> On Mon, Apr 19, 2010 at 10:27 AM, Ingram Chen
> >> >>> >> <ingramchen@gmail.com>
> >> >>> >> wrote:
> >> >>> >>>
> >> >>> >>> Hi all,
> >> >>> >>>
> >> >>> >>>     We have observed several connections between nodes
in
> >> >>> >>> CLOSE_WAIT
> >> >>> >>> after several hours of operation:
> >> >>> >>
> >> >>> >> This is symptomatic of not pooling your client connections
> >> >>> >> correctly.
> >> >>> >>  Be
> >> >>> >> sure you're using one connection per thread, not one connection
> per
> >> >>> >> operation.
> >> >>> >> -Brandon
> >> >>> >
> >> >>> >
> >> >>> > --
> >> >>> > Ingram Chen
> >> >>> > online share order: http://dinbendon.net
> >> >>> > blog: http://www.javaworld.com.tw/roller/page/ingramchen
> >> >>> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Ingram Chen
> >> >> online share order: http://dinbendon.net
> >> >> blog: http://www.javaworld.com.tw/roller/page/ingramchen
> >> >
> >> >
> >> >
> >> > --
> >> > Ingram Chen
> >> > online share order: http://dinbendon.net
> >> > blog: http://www.javaworld.com.tw/roller/page/ingramchen
> >> >
> >
> >
> >
> > --
> > Ingram Chen
> > online share order: http://dinbendon.net
> > blog: http://www.javaworld.com.tw/roller/page/ingramchen
> >
>



-- 
Ingram Chen
online share order: http://dinbendon.net
blog: http://www.javaworld.com.tw/roller/page/ingramchen

Mime
View raw message