Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 23197 invoked from network); 22 Apr 2010 01:54:16 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 Apr 2010 01:54:16 -0000 Received: (qmail 95427 invoked by uid 500); 22 Apr 2010 01:54:15 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 95412 invoked by uid 500); 22 Apr 2010 01:54:15 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 95404 invoked by uid 99); 22 Apr 2010 01:54:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Apr 2010 01:54:15 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ingramchen@gmail.com designates 209.85.222.181 as permitted sender) Received: from [209.85.222.181] (HELO mail-pz0-f181.google.com) (209.85.222.181) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Apr 2010 01:54:09 +0000 Received: by pzk11 with SMTP id 11so3042510pzk.28 for ; Wed, 21 Apr 2010 18:53:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type; bh=SJksT/H8iEX79pW6oqfIPqaYHTFURZbxYOfi6IKDxhI=; b=gyvqmY7b0EK8bi1VkeQ9VytpRCBWXnQCPIbibRUqsQWBPWq0NN9QFDMKZUTfuoFtkK rs5qlsJ0pG5v7XXI6nOYlFQlPkN1QptNQvd3GM7R4kB2HQkPvJ9XUgmfvzfbBN86SmNx xRV4fJBZkYUIGCGoDJvtp/Xm4H8nUygDoH17U= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=HKxjxs7GR4Ep4MCYR/bW8zZ0RXYCLnax5B18v0E8Qfw52V627reuxC+UTy99sgyk5l XFgiuGzJUDWOleQFOQtQiWWp0wwatGzy3tpj0RIZbQiYlnejn73Tcb9Yr7gz/IS4+0ZO 3LQS1MLe41ujMa1Vcd4xKF06RstcxwqmN7C3Y= MIME-Version: 1.0 Received: by 10.141.49.14 with HTTP; Wed, 21 Apr 2010 18:53:47 -0700 (PDT) In-Reply-To: References: Date: Thu, 22 Apr 2010 09:53:47 +0800 Received: by 10.141.14.4 with SMTP id r4mr838763rvi.171.1271901227356; Wed, 21 Apr 2010 18:53:47 -0700 (PDT) Message-ID: Subject: Re: tcp CLOSE_WAIT bug From: Ingram Chen To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=000e0cd11240d892990484c99430 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd11240d892990484c99430 Content-Type: text/plain; charset=UTF-8 I agree your point. I patch the code and log more informations to find out the real cause. Here is the code snip I think may be the cause: IncomingTcpConnection: public void run() { while (true) { try { MessagingService.validateMagic(input.readInt()); int header = input.readInt(); int type = MessagingService.getBits(header, 1, 2); boolean isStream = MessagingService.getBits(header, 3, 1) == 1; int version = MessagingService.getBits(header, 15, 8); if (isStream) { new IncomingStreamReader(socket.getChannel()).read(); } else { int size = input.readInt(); byte[] contentBytes = new byte[size]; input.readFully(contentBytes); MessagingService.getDeserializationExecutor().submit(new MessageDeserializationTask(new ByteArrayInputStream(contentBytes))); } } catch (EOFException e) { if (logger.isTraceEnabled()) logger.trace("eof reading from socket; closing", e); break; } catch (IOException e) { if (logger.isDebugEnabled()) logger.debug("error reading from socket; closing", e); break; } } } In normal condition, while loop is terminated after input.readInt() throw EOFException. but it quits without socket.close(). what I do is wrap whole while block inside a try { ... } finally {socket.close();} On Thu, Apr 22, 2010 at 01:14, Jonathan Ellis wrote: > I'd like to get something besides "I'm seeing close wait but i have no > idea why" for a bug report, since most people aren't seeing that. > > On Tue, Apr 20, 2010 at 9:33 AM, Ingram Chen wrote: > > I trace IncomingStreamReader source and found that incoming socket comes > > from MessagingService$SocketThread. > > but there is no close() call on either accepted socket or socketChannel. > > > > Should I file a bug report ? > > > > On Tue, Apr 20, 2010 at 11:02, Ingram Chen wrote: > >> > >> this happened after several hours of operations and both nodes are > started > >> at the same time (clean start without any data). so it might not relate > to > >> Bootstrap. > >> > >> In system.log I do not see any logs like "xxx node dead" or exceptions. > >> and both nodes in test are alive. they serve read/write well, too. Below > >> four connections between nodes are keep healthy from time to time. > >> > >> tcp 0 0 ::ffff:192.168.2.87:7000 > >> ::ffff:192.168.2.88:58447 ESTABLISHED > >> tcp 0 0 ::ffff:192.168.2.87:54986 > >> ::ffff:192.168.2.88:7000 ESTABLISHED > >> tcp 0 0 ::ffff:192.168.2.87:59138 > >> ::ffff:192.168.2.88:7000 ESTABLISHED > >> tcp 0 0 ::ffff:192.168.2.87:7000 > >> ::ffff:192.168.2.88:39074 ESTABLISHED > >> > >> so connections end in CLOSE_WAIT should be newly created. (for streaming > >> ?) This seems related to streaming issues we suffered recently: > >> http://n2.nabble.com/busy-thread-on-IncomingStreamReader-td4908640.html > >> > >> I would like add some debug codes around opening and closing of socket > to > >> find out what happend. > >> > >> Could you give me some hint, about what classes I should take look ? > >> > >> > >> On Tue, Apr 20, 2010 at 04:47, Jonathan Ellis > wrote: > >>> > >>> Is this after doing a bootstrap or other streaming operation? Or did > >>> a node go down? > >>> > >>> The internal sockets are supposed to remain open, otherwise. > >>> > >>> On Mon, Apr 19, 2010 at 10:56 AM, Ingram Chen > >>> wrote: > >>> > Thank your information. > >>> > > >>> > We do use connection pools with thrift client and ThriftAdress is on > >>> > port > >>> > 9160. > >>> > > >>> > Those problematic connections we found are all in port 7000, which is > >>> > internal communications port between > >>> > nodes. I guess this related to StreamingService. > >>> > > >>> > On Mon, Apr 19, 2010 at 23:46, Brandon Williams > >>> > wrote: > >>> >> > >>> >> On Mon, Apr 19, 2010 at 10:27 AM, Ingram Chen > > >>> >> wrote: > >>> >>> > >>> >>> Hi all, > >>> >>> > >>> >>> We have observed several connections between nodes in > CLOSE_WAIT > >>> >>> after several hours of operation: > >>> >> > >>> >> This is symptomatic of not pooling your client connections > correctly. > >>> >> Be > >>> >> sure you're using one connection per thread, not one connection per > >>> >> operation. > >>> >> -Brandon > >>> > > >>> > > >>> > -- > >>> > Ingram Chen > >>> > online share order: http://dinbendon.net > >>> > blog: http://www.javaworld.com.tw/roller/page/ingramchen > >>> > > >> > >> > >> > >> -- > >> Ingram Chen > >> online share order: http://dinbendon.net > >> blog: http://www.javaworld.com.tw/roller/page/ingramchen > > > > > > > > -- > > Ingram Chen > > online share order: http://dinbendon.net > > blog: http://www.javaworld.com.tw/roller/page/ingramchen > > > -- Ingram Chen online share order: http://dinbendon.net blog: http://www.javaworld.com.tw/roller/page/ingramchen --000e0cd11240d892990484c99430 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I agree your point. I patch the code and log more informations to find out = the real cause.

Here is the code snip I think may be the cause:
<= br>IncomingTcpConnection:

=C2=A0= =C2=A0=C2=A0 public void run()
=C2=A0=C2=A0=C2=A0 {
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 w= hile (true)
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 {
=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 try
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 {
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 MessagingService.validateMagic(input.readInt());
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int h= eader =3D input.readInt();
=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= int type =3D MessagingService.getBits(header, 1, 2);
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 boole= an isStream =3D MessagingService.getBits(header, 3, 1) =3D=3D 1;
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int version =3D MessagingService.ge= tBits(header, 15, 8);

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (isStream)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 {
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 new IncomingStreamReader(socket.getChannel()).read();
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 }
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 else<= /span>
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 {
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 int size =3D input.readInt();
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 byte[] contentBytes =3D ne= w byte[size];
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 input.readFully(contentBytes);
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 MessagingService.getDeseri= alizationExecutor().submit(new MessageDeserializationTask(new ByteArrayInpu= tStream(contentBytes)));
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 }
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 }
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 catch (EOFException e)=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 {
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (l= ogger.isTraceEnabled())

=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 logger.trace("eof reading from socket; closing&q= uot;, e);
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 break= ;
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 }
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 catch (IOException e)<= br style=3D"font-family: courier new,monospace;">=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 {
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (l= ogger.isDebugEnabled())

=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 logger.debug("error reading from socket; closing= ", e);
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 break= ;
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 }
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 }
=C2=A0=C2=A0=C2=A0= }

In normal condition, input.readInt() throw EOFException. but it quits without socket.close= (). what I do is wrap whole while block inside a try { ... } finally {socket.close();}


On Thu, Apr 22, 2010 at 01:14, Jonathan = Ellis <jbellis@gm= ail.com> wrote:
I'd like to get something besides "I'm seeing close wait but i= have no
idea why" for a bug report, since most people aren't seeing that.<= br>

On Tue, Apr 20, 2010 at 9:33 AM, Ingram Chen <ingramchen@gmail.com> wrote:
> I trace IncomingStreamReader source and found that incoming socket com= es
> from MessagingService$SocketThread.
> but there is no close() call on either accepted socket or socketChanne= l.
>
> Should I file a bug report ?
>
> On Tue, Apr 20, 2010 at 11:02, Ingram Chen <ingramchen@gmail.com> wrote:
>>
>> this happened after several hours of operations and both nodes are= started
>> at the same time (clean start without any data). so it might not r= elate to
>> Bootstrap.
>>
>> In system.log I do not see any logs like "xxx node dead"= or exceptions.
>> and both nodes in test are alive. they serve read/write well, too.= Below
>> four connections between nodes are keep healthy from time to time.=
>>
>> tcp=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 0 ::ffff:192.168.2.87:7000
>> ::ffff:192= .168.2.88:58447=C2=A0=C2=A0 ESTABLISHED
>> tcp=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 0 ::ffff:192.168.2.87:54986
>> ::ffff:192.= 168.2.88:7000=C2=A0=C2=A0=C2=A0 ESTABLISHED
>> tcp=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 0 ::ffff:192.168.2.87:59138
>> ::ffff:192.= 168.2.88:7000=C2=A0=C2=A0=C2=A0 ESTABLISHED
>> tcp=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 0 ::ffff:192.168.2.87:7000
>> ::ffff:192= .168.2.88:39074=C2=A0=C2=A0 ESTABLISHED
>>
>> so connections end in CLOSE_WAIT should be newly created. (for str= eaming
>> ?) This seems related to streaming issues we suffered recently: >> http://n2.nabble.com/busy-thread-on-Inc= omingStreamReader-td4908640.html
>>
>> I would like add some debug codes around opening and closing of so= cket to
>> find out what happend.
>>
>> Could you give me some hint, about what classes I should take look= ?
>>
>>
>> On Tue, Apr 20, 2010 at 04:47, Jonathan Ellis <jbellis@gmail.com> wrote:
>>>
>>> Is this after doing a bootstrap or other streaming operation? = =C2=A0Or did
>>> a node go down?
>>>
>>> The internal sockets are supposed to remain open, otherwise. >>>
>>> On Mon, Apr 19, 2010 at 10:56 AM, Ingram Chen <ingramchen@gmail.com>
>>> wrote:
>>> > Thank your information.
>>> >
>>> > We do use connection pools with thrift client and ThriftA= dress is on
>>> > port
>>> > 9160.
>>> >
>>> > Those problematic connections we found are all in port 70= 00, which is
>>> > internal communications port between
>>> > nodes. I guess this related to StreamingService.
>>> >
>>> > On Mon, Apr 19, 2010 at 23:46, Brandon Williams <driftx@gmail.com>
>>> > wrote:
>>> >>
>>> >> On Mon, Apr 19, 2010 at 10:27 AM, Ingram Chen <ingramchen@gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> Hi all,
>>> >>>
>>> >>> =C2=A0=C2=A0=C2=A0 We have observed several conne= ctions between nodes in CLOSE_WAIT
>>> >>> after several hours of operation:
>>> >>
>>> >> This is symptomatic of not pooling your client connec= tions correctly.
>>> >> =C2=A0Be
>>> >> sure you're using one connection per thread, not = one connection per
>>> >> operation.
>>> >> -Brandon
>>> >
>>> >
>>> > --
>>> > Ingram Chen
>>> > online share order: http://dinbendon.net
>>> > blog: http://www.javaworld.com.tw/roller/page/ingra= mchen
>>> >
>>
>>
>>
>> --
>> Ingram Chen
>> online share order: http://dinbendon.net
>> blog: http://www.javaworld.com.tw/roller/page/ingramchen=
>
>
>
> --
> Ingram Chen
> online share order: http://dinbendon.net
> blog: http://www.javaworld.com.tw/roller/page/ingramchen
>



--
Ingram Chen=
online share order: http://dinbendon.n= et
blog: http://www.javaworld.com.tw/roller/page/ingramchen
--000e0cd11240d892990484c99430--