Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 72544 invoked from network); 20 Aug 2009 23:38:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Aug 2009 23:38:01 -0000 Received: (qmail 97250 invoked by uid 500); 20 Aug 2009 23:38:19 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 97231 invoked by uid 500); 20 Aug 2009 23:38:19 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 97218 invoked by uid 99); 20 Aug 2009 23:38:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Aug 2009 23:38:19 +0000 X-ASF-Spam-Status: No, hits=0.9 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,TVD_FW_GRAPHIC_NAME_MID,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: unknown (nike.apache.org: error in processing during lookup of junrao@almaden.ibm.com) Received: from [32.97.182.142] (HELO e2.ny.us.ibm.com) (32.97.182.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Aug 2009 23:38:07 +0000 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e2.ny.us.ibm.com (8.14.3/8.13.1) with ESMTP id n7KNVkeP024527 for ; Thu, 20 Aug 2009 19:31:46 -0400 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id n7KNbk3t179514 for ; Thu, 20 Aug 2009 19:37:46 -0400 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n7KNbkB6013345 for ; Thu, 20 Aug 2009 19:37:46 -0400 Received: from d01ml604.pok.ibm.com (d01ml604.pok.ibm.com [9.56.227.90]) by d01av01.pok.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id n7KNbjPQ013342 for ; Thu, 20 Aug 2009 19:37:45 -0400 In-Reply-To: <06AC1CD5-820F-48DD-82D7-ABFA0EEA7418@digitalreasoning.com> References: <18D28A58-4E5F-40F2-8F84-4A6D5E1AF1CD@digitalreasoning.com> <545D846A-5C5A-491F-AF4B-189F3FD71E50@digitalreasoning.com> <06AC1CD5-820F-48DD-82D7-ABFA0EEA7418@digitalreasoning.com> Subject: Re: quorum read timeout X-KeepSent: 046BA56C:D11C32F9-88257618:0081ABCE; type=4; name=$KeepSent To: cassandra-user@incubator.apache.org X-Mailer: Lotus Notes Release 8.0.2 HF623 January 16, 2009 Message-ID: From: Jun Rao Date: Thu, 20 Aug 2009 16:37:47 -0700 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Release 8.5|December 05, 2008) at 08/20/2009 19:37:45 MIME-Version: 1.0 Content-type: multipart/related; Boundary="0__=07BBFC8BDF122D5E8f9e8a93df938690918c07BBFC8BDF122D5E" X-Virus-Checked: Checked by ClamAV on apache.org --0__=07BBFC8BDF122D5E8f9e8a93df938690918c07BBFC8BDF122D5E Content-type: multipart/alternative; Boundary="1__=07BBFC8BDF122D5E8f9e8a93df938690918c07BBFC8BDF122D5E" --1__=07BBFC8BDF122D5E8f9e8a93df938690918c07BBFC8BDF122D5E Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: quoted-printable You should be aware that Cassandra doesn't support changing column fami= lies without reloading the data. Jun IBM Almaden Research Center K55/B1, 650 Harry Road, San Jose, CA 95120-6099 junrao@almaden.ibm.com |------------> | From: | |------------> >--------------------------------------------------------------------= -----------------------------------------------------------------------= -------| |Phillip Michalak = = | >--------------------------------------------------------------------= -----------------------------------------------------------------------= -------| |------------> | To: | |------------> >--------------------------------------------------------------------= -----------------------------------------------------------------------= -------| |cassandra-user@incubator.apache.org = = | >--------------------------------------------------------------------= -----------------------------------------------------------------------= -------| |------------> | Date: | |------------> >--------------------------------------------------------------------= -----------------------------------------------------------------------= -------| |08/20/2009 04:16 PM = = | >--------------------------------------------------------------------= -----------------------------------------------------------------------= -------| |------------> | Subject: | |------------> >--------------------------------------------------------------------= -----------------------------------------------------------------------= -------| |Re: quorum read timeout = = | >--------------------------------------------------------------------= -----------------------------------------------------------------------= -------| I can't reproduce this after cleaning out the data/ directories (and co= mmit logs) and restarting the cluster. I suspect that I had gotten to an inconsistent state as I was playing around with the keyspace column families. I've updated the bug report with a comment to this effect, but have lef= t it open for someone with more experience to close it out. Cheers, Phil On Aug 19, 2009, at 5:14 PM, Phillip Michalak wrote: Sure thing. Filed as https://issues.apache.org/jira/browse/CASSANDRA-381 Thanks, Phil On Aug 19, 2009, at 4:54 PM, Jonathan Ellis wrote: Looks like a bug in TcpConnectionManager. Can you file a ticket? thanks, -Jonathan On Wed, Aug 19, 2009 at 2:49 PM, Phillip Michalak wrote: It's cassandra-0.4-beta1. Thanks! Phil On Aug 19, 2009, at 4:43 PM, Jonathan Ellis wrote: Is this 0.3 or 0.4/trunk? On Wed, Aug 19, 2009 at 2:36 PM, Phillip Michalak wr= ote: I'm running three Cassandra nodes in virt= ual machines. During a 'get' operation from Cassandra-remote directed at one of these= nodes, I'm receiving the following output= vadmin@vadmin:~/cassandra$ interface/gen-py/cassandra/Cassandra-remo= te -h 192.168.133.130:9160 get 'MockElementLibr= ary' '0401318uuuuruepwdcznr' "ColumnPath('strings', None, 'id')" 2 /usr/local/lib/python2.6/dist-packages/th= rift/Thrift.py:58: DeprecationWarning: BaseException.message= has been deprecated as of Python 2.6 self.message =3D message /usr/local/lib/python2.6/dist-packages/th= rift/Thrift.py:99: DeprecationWarning: BaseException.message= has been deprecated as of Python 2.6 self.message =3D iprot.readString(); Traceback (most recent call last): File "interface/gen-py/cassandra/Cassandra-rem= ote", line 93, in pp.pprint(client.get(args[0],args[1],e= val (args[2]),eval(args[3]),)) File "/home/vadmin/cassandra-0.4.0-beta1/inter= face/gen-py/cassandra/Cassandra.py", line 182, in get return self.recv_get() File "/home/vadmin/cassandra-0.4.0-beta1/inter= face/gen-py/cassandra/Cassandra.py", line 201, in recv_get raise x thrift.Thrift.TApplicationException/usr/l= ocal/lib/python2.6/dist-packages/thrift/Thrift.py:76: DeprecationWarning: BaseException.message= has been deprecated as of Python 2.6 if self.message: /usr/local/lib/python2.6/dist-packages/th= rift/Thrift.py:77: DeprecationWarning: BaseException.message= has been deprecated as of Python 2.6 return self.message : Internal error processing get The same 'get' operation from Cassandra-remote directed at another of these nodes, yields 'normal' output vadmin@vadmin:~/cassandra$ interface/gen-py/cassandra/Cassandra-remo= te -h 192.168.133.129:9160 get 'MockElementLibr= ary' '0401318uuuuruepwdcznr' "ColumnPath('strings', None, 'id')" 2 Traceback (most recent call last): File "interface/gen-py/cassandra/Cassandra-rem= ote", line 93, in pp.pprint(client.get(args[0],args[1],e= val (args[2]),eval(args[3]),)) File "/home/vadmin/cassandra-0.4.0-beta1/inter= face/gen-py/cassandra/Cassandra.py", line 182, in get return self.recv_get() File "/home/vadmin/cassandra-0.4.0-beta1/inter= face/gen-py/cassandra/Cassandra.py", line 210, in recv_get raise result.nfe ttypes.NotFoundException: NotFoundExcepti= on() Furthermore, querying the same column for= (some) other keys is successful when no matter which node it is directed = at. Looking at the log for the node that prod= uced the error from the query above: DEBUG [pool-1-thread-22] 2009-08-19 16:54:57,618 CassandraServer.java (line 221) get DEBUG [pool-1-thread-22] 2009-08-19 16:54:57,618 StorageProxy.java (line 420) strongread reading data for SliceByNamesReadCommand (table=3D'MockElementLibrary', key=3D'0401318uuuuruepwdcznr', columnParent=3D'QueryPath (columnFamilyName=3D'strings', superColumnName=3D'null', columnName=3D'null')', columns=3D[id,]) f= rom 38184@null DEBUG [pool-1-thread-22] 2009-08-19 16:54:57,619 StorageProxy.java (line 427) strongread reading digest for SliceByNamesReadCommand (table=3D'MockElementLibrary', key=3D'0401318uuuuruepwdcznr', columnParent=3D'QueryPath (columnFamilyName=3D'strings', superColumnName=3D'null', columnName=3D'null')', columns=3D[id,]) f= rom 38185@192.168.133.129:7000 WARN [MESSAGE-SERIALIZER-POOL:4] 2009-08= -19 16:54:57,619 MessageSerializationTask.java (line 81) Exception was generated at : 08/19/2009 16:54:57 on thread MESSAGE-SERIALIZER-POOL:4 java.lang.NullPointerException at org.apache.cassandra.net.TcpConnection. (TcpConnection.java:83) at org.apache.cassandra.net.TcpConnectionMan= ager.getConnection (TcpConnectionManager.java:64) at org.apache.cassandra.net.MessagingService= .getConnection (MessagingService.java:306) at org.apache.cassandra.net.MessageSerializa= tionTask.run (MessageSerializationTask.java:66) at java.util.concurrent.ThreadPoolExecutor $Worker.runTask(ThreadPoolExecutor.java:8= 86) at java.util.concurrent.ThreadPoolExecutor $Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run (Thread.java:619) DEBUG [RESPONSE-STAGE:4] 2009-08-19 16:54:57,622 ResponseVerbHandler.java (line 38) Processing response on a callba= ck from 65B1E352-A0A3-1A7F-138B-9BEA3E1D787F@192.= 168.133.129 :7000 ERROR [pool-1-thread-22] 2009-08-19 16:55:02,619 Cassandra.java (line 608) Internal error processing get java.lang.RuntimeException: java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses fro= m 192.168.133.129:7000 . at org.apache.cassandra.service.CassandraSer= ver.readColumnFamily (CassandraServer.java:100) at org.apache.cassandra.service.CassandraSer= ver.get (CassandraServer.java:226) at org.apache.cassandra.service.Cassandra $Processor$get.process(Cassandra.java:602= ) at org.apache.cassandra.service.Cassandra $Processor.process(Cassandra.java:560) at org.apache.thrift.server.TThreadPoolServe= r $WorkerProcess.run (TThreadPoolServer.java:252) at java.util.concurrent.ThreadPoolExecutor $Worker.runTask(ThreadPoolExecutor.java:8= 86) at java.util.concurrent.ThreadPoolExecutor $Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run (Thread.java:619) Caused by: java.util.concurrent.TimeoutException: Operation timed out - received only 1 responses from 192.168.133.129:7000 . at org.apache.cassandra.service.QuorumRespon= seHandler.get (QuorumResponseHandler.java:86) at org.apache.cassandra.service.StorageProxy= .strongRead (StorageProxy.java:435) at org.apache.cassandra.service.StorageProxy= .readProtocol (StorageProxy.java:330) at org.apache.cassandra.service.CassandraSer= ver.readColumnFamily (CassandraServer.java:92) ... 7 more It appears to me that there is a timeout during the QuorumResponseHandler processing, stemming from a NullPointerException that occurs as part = of the read process. I suspect that this NullPointerException has something to do with the second DEBUG [pool-1-thread-22] comment regarding strongread ... from 38184@null. Does anyone know why this might be happen= ing? Thanks for any insight, Phil = --1__=07BBFC8BDF122D5E8f9e8a93df938690918c07BBFC8BDF122D5E Content-type: text/html; charset=US-ASCII Content-Disposition: inline Content-transfer-encoding: quoted-printable

You should be aware that Cassandra doesn't support changing column f= amilies without reloading the data.

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA 95120-6099

junrao@almaden.ibm.com


3D"InactivePhillip Michalak ---= 08/20/2009 04:16:44 PM---I can't reproduce this after cleaning out the = data/ directories (and commit logs) and restarting t

=
3D=
From:
= 3D""
Phillip Michalak <phil.michalak@digitalreasoning.co= m>
3D=
To:

cassandra-user@incubator.apache.org
3D=
Date:
= 3D""
08/20/2009 04:16 PM
3D=
Subject:
3D""
Re: quorum read timeout





I can't reproduce this after cleaning out the data/ di= rectories (and commit logs) and restarting the cluster. I suspect that = I had gotten to an inconsistent state as I was playing around with the = keyspace column families.

I've updated the bug report with a comment to this eff= ect, but have left it open for someone with more experience to close it= out.

Cheers,
Phil

On Aug 19, 2009, at 5:14 PM, Phillip Michalak wrote:
      Sure thing. Filed as https://issues.apache.org/jira/browse/CASSANDRA-381

      Thanks,

      Phil

      On Aug 19, 2009, at 4:54 PM, Jonathan Ellis wrote:
          Looks like a bug in TcpConnectionManager. Can you= file a ticket?

          thanks,

          -Jonathan

          On Wed, Aug 19, 2009 at 2:49 PM, Phillip
          Michalak<
          phil.michalak@digitalreasoning.= com> wrote:
              It's cassandra-0.4-beta1.

              Thanks!
              Phil

              On Aug 19, 2009, at 4:43 PM, Jonathan Ellis wrote:
                  Is this 0.3 or 0.4/trunk?

                  On Wed, Aug 19, 2009 at 2:36 PM, Phillip
                  Michalak<phil.michalak@= digitalreasoning.com> wrote:

                      I'm running three Cassandra nodes in virtual machines.=
                      During a 'get' operation from Cassandra-remote directe= d at one of these
                      nodes, I'm receiving the following output

                      vadmin@vadmin:~/cassandra$ interface/gen-py/cassandra/= Cassandra-remote -h
                      192.168.133.130:9160 get 'MockElementLibrary' '0401318= uuuuruepwdcznr'
                      "ColumnPath('strings', None, 'id')" 2=
                      /usr/local/lib/python2.6/dist-packages/thrift/Thrift.p= y:58:
                      DeprecationWarning: BaseException.message has been dep= recated as of
                      Python
                      2.6
                      self.message =3D message
                      /usr/local/lib/python2.6/dist-packages/thrift/Thrift.p= y:99:
                      DeprecationWarning: BaseException.message has been dep= recated as of
                      Python
                      2.6
                      self.message =3D iprot.readString();
                      Traceback (most recent call last):
                      File "interface/gen-py/cassandra/Cassandra-remot= e", line 93, in <module>
                      pp.pprint(client.get(args[0],args[1],eval(args[2]),= eval(args[3]),))
                      File

                      "/home/vadmin/cassandra-0.4.0-beta1/interface/gen= -py/cassandra/Cassandra.py",
                      line 182, in get
                      return self.recv_get()
                      File

                      "/home/vadmin/cassandra-0.4.0-beta1/interface/gen= -py/cassandra/Cassandra.py",
                      line 201, in recv_get
                      raise x

                      thrift.Thrift.TApplicationException/usr/local/lib/pyth= on2.6/dist-packages/thrift/Thrift.py:76:
                      DeprecationWarning: BaseException.message has been dep= recated as of
                      Python
                      2.6
                      if self.message:
                      /usr/local/lib/python2.6/dist-packages/thrift/Thrift.p= y:77:
                      DeprecationWarning: BaseException.message has been dep= recated as of
                      Python
                      2.6
                      return self.message
                      : Internal error processing get

                      The same 'get' operation from Cassandra-remote directe= d at another of
                      these
                      nodes, yields 'normal' output
                      vadmin@vadmin:~/cassandra$ interface/gen-py/cassandra/= Cassandra-remote -h
                      192.168.133.129:9160 get 'MockElementLibrary' '0401318= uuuuruepwdcznr'
                      "ColumnPath('strings', None, 'id')" 2=
                      Traceback (most recent call last):
                      File "interface/gen-py/cassandra/Cassandra-remot= e", line 93, in <module>
                      pp.pprint(client.get(args[0],args[1],eval(args[2]),= eval(args[3]),))
                      File

                      "/home/vadmin/cassandra-0.4.0-beta1/interface/gen= -py/cassandra/Cassandra.py",
                      line 182, in get
                      return self.recv_get()
                      File

                      "/home/vadmin/cassandra-0.4.0-beta1/interface/gen= -py/cassandra/Cassandra.py",
                      line 210, in recv_get
                      raise result.nfe
                      ttypes.NotFoundException: NotFoundException() Furthermore, querying the same column for (some) other= keys is successful
                      when no matter which node it is directed at. Looking at the log for the node that produced the erro= r from the query
                      above:

                      DEBUG [pool-1-thread-22] 2009-08-19 16:54:57,618 Cassa= ndraServer.java
                      (line
                      221) get
                      DEBUG [pool-1-thread-22] 2009-08-19 16:54:57,618 Stora= geProxy.java (line
                      420) strongread reading data for
                      SliceByNamesReadCommand(table=3D'MockElementLibrary',<= /font>
                      key=3D'0401318uuuuruepwdcznr',
                      columnParent=3D'QueryPath(columnFamilyName=3D'strings'= ,
                      superColumnName=3D'null',
                      columnName=3D'null')', columns=3D[id,]) from 38184@nul= l
                      DEBUG [pool-1-thread-22] 2009-08-19 16:54:57,619 Stora= geProxy.java (line
                      427) strongread reading digest for
                      SliceByNamesReadCommand(table=3D'MockElementLibrary',<= /font>
                      key=3D'0401318uuuuruepwdcznr',
                      columnParent=3D'QueryPath(columnFamilyName=3D'strings'= ,
                      superColumnName=3D'null',
                      columnName=3D'null')', columns=3D[id,]) from 38185@192.168.133.129:7000=
                      WARN [MESSAGE-SERIALIZER-POOL:4] 2009-08-19 16:54:57,= 619
                      MessageSerializationTask.java (line 81) Exception was = generated at :
                      08/19/2009 16:54:57 on thread MESSAGE-SERIALIZER-POOL:= 4
                      java.lang.NullPointerException
                      at
                      org.apache.cassandra.net.TcpConnection.<init>(Tc= pConnection.java:83)
                      at

                      org.apache.cassandra.net.TcpConnectionManager.getConne= ction(TcpConnectionManager.java:64)
                      at

                      org.apache.cassandra.net.MessagingService.getConnectio= n(MessagingService.java:306)
                      at

                      org.apache.cassandra.net.MessageSerializationTask.run(= MessageSerializationTask.java:66)
                      at

                      java.util.concurrent.ThreadPoolExecutor$Worker.runTask= (ThreadPoolExecutor.java:886)
                      at

                      java.util.concurrent.ThreadPoolExecutor$Worker.run(Thr= eadPoolExecutor.java:908)
                      at java.lang.Thread.run(Thread.java:619)=
                      DEBUG [RESPONSE-STAGE:4] 2009-08-19 16:54:57,622 Respo= nseVerbHandler.java
                      (line 38) Processing response on a callback
                      = from 65B= 1E352-A0A3-1A7F-138B-9BEA3E1D787F@192.168.133.129:7000
                      ERROR [pool-1-thread-22] 2009-08-19 16:55:02,619 Cassa= ndra.java (line
                      608)
                      Internal error processing get
                      java.lang.RuntimeException: java.util.concurrent.Timeo= utException:
                      Operation
                      timed out - received only 1 responses from 192.168.133= .129:7000 .
                      at

                      org.apache.cassandra.service.CassandraServer.readColum= nFamily(CassandraServer.java:100)
                      at

                      org.apache.cassandra.service.CassandraServer.get(Cassa= ndraServer.java:226)
                      at

                      org.apache.cassandra.service.Cassandra$Processor$get.p= rocess(Cassandra.java:602)
                      at

                      org.apache.cassandra.service.Cassandra$Processor.proce= ss(Cassandra.java:560)
                      at

                      org.apache.thrift.server.TThreadPoolServer$WorkerProce= ss.run(TThreadPoolServer.java:252)
                      at

                      java.util.concurrent.ThreadPoolExecutor$Worker.runTask= (ThreadPoolExecutor.java:886)
                      at

                      java.util.concurrent.ThreadPoolExecutor$Worker.run(Thr= eadPoolExecutor.java:908)
                      at java.lang.Thread.run(Thread.java:619)=
                      Caused by: java.util.concurrent.TimeoutException: Oper= ation timed out -
                      received only 1 responses from 192.168.133.129:7000 .<= /font>
                      at

                      org.apache.cassandra.service.QuorumResponseHandler.get= (QuorumResponseHandler.java:86)
                      at

                      org.apache.cassandra.service.StorageProxy.strongRead(S= torageProxy.java:435)
                      at

                      org.apache.cassandra.service.StorageProxy.readProtocol= (StorageProxy.java:330)
                      at

                      org.apache.cassandra.service.CassandraServer.readColum= nFamily(CassandraServer.java:92)
                      ... 7 more

                      It appears to me that there is a timeout during the Qu= orumResponseHandler
                      processing, stemming from a NullPointerException that = occurs as part of
                      the
                      read process. I suspect that this NullPointerException= has something to
                      do
                      with the second DEBUG [pool-1-thread-22] comment regar= ding strongread ...
                      from 38184@null.
                      Does anyone know why this might be happening? Thanks for any insight,
                      Phil



= --1__=07BBFC8BDF122D5E8f9e8a93df938690918c07BBFC8BDF122D5E-- --0__=07BBFC8BDF122D5E8f9e8a93df938690918c07BBFC8BDF122D5E Content-type: image/gif; name="graycol.gif" Content-Disposition: inline; filename="graycol.gif" Content-ID: <1__=07BBFC8BDF122D5E8f9e8a93df938@us.ibm.com> Content-transfer-encoding: base64 R0lGODlhEAAQAKECAMzMzAAAAP///wAAACH5BAEAAAIALAAAAAAQABAAAAIXlI+py+0PopwxUbpu ZRfKZ2zgSJbmSRYAIf4fT3B0aW1pemVkIGJ5IFVsZWFkIFNtYXJ0U2F2ZXIhAAA7 --0__=07BBFC8BDF122D5E8f9e8a93df938690918c07BBFC8BDF122D5E Content-type: image/gif; name="ecblank.gif" Content-Disposition: inline; filename="ecblank.gif" Content-ID: <2__=07BBFC8BDF122D5E8f9e8a93df938@us.ibm.com> Content-transfer-encoding: base64 R0lGODlhEAABAIAAAAAAAP///yH5BAEAAAEALAAAAAAQAAEAAAIEjI8ZBQA7 --0__=07BBFC8BDF122D5E8f9e8a93df938690918c07BBFC8BDF122D5E--