incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ingram Chen <ingramc...@gmail.com>
Subject Re: tcp CLOSE_WAIT bug
Date Thu, 22 Apr 2010 01:53:47 GMT
I agree your point. I patch the code and log more informations to find out
the real cause.

Here is the code snip I think may be the cause:

IncomingTcpConnection:

    public void run()
    {
        while (true)
        {
            try
            {
                MessagingService.validateMagic(input.readInt());
                int header = input.readInt();
                int type = MessagingService.getBits(header, 1, 2);
                boolean isStream = MessagingService.getBits(header, 3, 1) ==
1;
                int version = MessagingService.getBits(header, 15, 8);

                if (isStream)
                {
                    new IncomingStreamReader(socket.getChannel()).read();
                }
                else
                {
                    int size = input.readInt();
                    byte[] contentBytes = new byte[size];
                    input.readFully(contentBytes);
                    MessagingService.getDeserializationExecutor().submit(new
MessageDeserializationTask(new ByteArrayInputStream(contentBytes)));
                }
            }
            catch (EOFException e)
            {
                if (logger.isTraceEnabled())
                    logger.trace("eof reading from socket; closing", e);
                break;
            }
            catch (IOException e)
            {
                if (logger.isDebugEnabled())
                    logger.debug("error reading from socket; closing", e);
                break;
            }
        }
    }

In normal condition, while loop is terminated after input.readInt() throw
EOFException. but it quits without socket.close(). what I do is wrap whole
while block inside a try { ... } finally {socket.close();}


On Thu, Apr 22, 2010 at 01:14, Jonathan Ellis <jbellis@gmail.com> wrote:

> I'd like to get something besides "I'm seeing close wait but i have no
> idea why" for a bug report, since most people aren't seeing that.
>
> On Tue, Apr 20, 2010 at 9:33 AM, Ingram Chen <ingramchen@gmail.com> wrote:
> > I trace IncomingStreamReader source and found that incoming socket comes
> > from MessagingService$SocketThread.
> > but there is no close() call on either accepted socket or socketChannel.
> >
> > Should I file a bug report ?
> >
> > On Tue, Apr 20, 2010 at 11:02, Ingram Chen <ingramchen@gmail.com> wrote:
> >>
> >> this happened after several hours of operations and both nodes are
> started
> >> at the same time (clean start without any data). so it might not relate
> to
> >> Bootstrap.
> >>
> >> In system.log I do not see any logs like "xxx node dead" or exceptions.
> >> and both nodes in test are alive. they serve read/write well, too. Below
> >> four connections between nodes are keep healthy from time to time.
> >>
> >> tcp        0      0 ::ffff:192.168.2.87:7000
> >> ::ffff:192.168.2.88:58447   ESTABLISHED
> >> tcp        0      0 ::ffff:192.168.2.87:54986
> >> ::ffff:192.168.2.88:7000    ESTABLISHED
> >> tcp        0      0 ::ffff:192.168.2.87:59138
> >> ::ffff:192.168.2.88:7000    ESTABLISHED
> >> tcp        0      0 ::ffff:192.168.2.87:7000
> >> ::ffff:192.168.2.88:39074   ESTABLISHED
> >>
> >> so connections end in CLOSE_WAIT should be newly created. (for streaming
> >> ?) This seems related to streaming issues we suffered recently:
> >> http://n2.nabble.com/busy-thread-on-IncomingStreamReader-td4908640.html
> >>
> >> I would like add some debug codes around opening and closing of socket
> to
> >> find out what happend.
> >>
> >> Could you give me some hint, about what classes I should take look ?
> >>
> >>
> >> On Tue, Apr 20, 2010 at 04:47, Jonathan Ellis <jbellis@gmail.com>
> wrote:
> >>>
> >>> Is this after doing a bootstrap or other streaming operation?  Or did
> >>> a node go down?
> >>>
> >>> The internal sockets are supposed to remain open, otherwise.
> >>>
> >>> On Mon, Apr 19, 2010 at 10:56 AM, Ingram Chen <ingramchen@gmail.com>
> >>> wrote:
> >>> > Thank your information.
> >>> >
> >>> > We do use connection pools with thrift client and ThriftAdress is on
> >>> > port
> >>> > 9160.
> >>> >
> >>> > Those problematic connections we found are all in port 7000, which
is
> >>> > internal communications port between
> >>> > nodes. I guess this related to StreamingService.
> >>> >
> >>> > On Mon, Apr 19, 2010 at 23:46, Brandon Williams <driftx@gmail.com>
> >>> > wrote:
> >>> >>
> >>> >> On Mon, Apr 19, 2010 at 10:27 AM, Ingram Chen <ingramchen@gmail.com
> >
> >>> >> wrote:
> >>> >>>
> >>> >>> Hi all,
> >>> >>>
> >>> >>>     We have observed several connections between nodes in
> CLOSE_WAIT
> >>> >>> after several hours of operation:
> >>> >>
> >>> >> This is symptomatic of not pooling your client connections
> correctly.
> >>> >>  Be
> >>> >> sure you're using one connection per thread, not one connection
per
> >>> >> operation.
> >>> >> -Brandon
> >>> >
> >>> >
> >>> > --
> >>> > Ingram Chen
> >>> > online share order: http://dinbendon.net
> >>> > blog: http://www.javaworld.com.tw/roller/page/ingramchen
> >>> >
> >>
> >>
> >>
> >> --
> >> Ingram Chen
> >> online share order: http://dinbendon.net
> >> blog: http://www.javaworld.com.tw/roller/page/ingramchen
> >
> >
> >
> > --
> > Ingram Chen
> > online share order: http://dinbendon.net
> > blog: http://www.javaworld.com.tw/roller/page/ingramchen
> >
>



-- 
Ingram Chen
online share order: http://dinbendon.net
blog: http://www.javaworld.com.tw/roller/page/ingramchen

Mime
View raw message