The following line from the log likely shows the problem. It looks like that a datanode is somehow null, which shouldn't happen.


Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA 95120-6099

junrao@almaden.ibm.com


Phillip Michalak <phil.michalak@digitalreasoning.com> wrote on 08/19/2009 02:36:13 PM:

> [image removed]

>
> quorum read timeout

>
> Phillip Michalak

>
> to:

>
> cassandra-user

>
> 08/19/2009 02:37 PM

>
> Please respond to cassandra-user

>
> I'm running three Cassandra nodes in virtual machines.

>
> During a 'get' operation from Cassandra-remote directed at one of
> these nodes, I'm receiving the following output

>
> vadmin@vadmin:~/cassandra$ interface/gen-py/cassandra/Cassandra-
> remote -h 192.168.133.130:9160 get 'MockElementLibrary'
> '0401318uuuuruepwdcznr' "ColumnPath('strings', None, 'id')" 2

> /usr/local/lib/python2.6/dist-packages/thrift/Thrift.py:58:
> DeprecationWarning: BaseException.message has been deprecated as of Python 2.6

>   self.message = message
> /usr/local/lib/python2.6/dist-packages/thrift/Thrift.py:99:
> DeprecationWarning: BaseException.message has been deprecated as of Python 2.6

>   self.message = iprot.readString();
> Traceback (most recent call last):
>   File "interface/gen-py/cassandra/Cassandra-remote", line 93, in <module>
>     pp.pprint(client.get(args[0],args[1],eval(args[2]),eval(args[3]),))
>   File "/home/vadmin/cassandra-0.4.0-beta1/interface/gen-py/
> cassandra/Cassandra.py", line 182, in get

>     return self.recv_get()
>   File "/home/vadmin/cassandra-0.4.0-beta1/interface/gen-py/
> cassandra/Cassandra.py", line 201, in recv_get

>     raise x
> thrift.Thrift.TApplicationException/usr/local/lib/python2.6/dist-
> packages/thrift/Thrift.py:76: DeprecationWarning:
> BaseException.message has been deprecated as of Python 2.6

>   if self.message:
> /usr/local/lib/python2.6/dist-packages/thrift/Thrift.py:77:
> DeprecationWarning: BaseException.message has been deprecated as of Python 2.6

>   return self.message
> : Internal error processing get
>
> The same 'get' operation from Cassandra-remote directed at another
> of these nodes, yields 'normal' output

>
> vadmin@vadmin:~/cassandra$ interface/gen-py/cassandra/Cassandra-
> remote -h 192.168.133.129:9160 get 'MockElementLibrary'
> '0401318uuuuruepwdcznr' "ColumnPath('strings', None, 'id')" 2

> Traceback (most recent call last):
>   File "interface/gen-py/cassandra/Cassandra-remote", line 93, in <module>
>     pp.pprint(client.get(args[0],args[1],eval(args[2]),eval(args[3]),))
>   File "/home/vadmin/cassandra-0.4.0-beta1/interface/gen-py/
> cassandra/Cassandra.py", line 182, in get

>     return self.recv_get()
>   File "/home/vadmin/cassandra-0.4.0-beta1/interface/gen-py/
> cassandra/Cassandra.py", line 210, in recv_get

>     raise result.nfe
> ttypes.NotFoundException: NotFoundException()
>
> Furthermore, querying the same column for (some) other keys is
> successful when no matter which node it is directed at.

> Looking at the log for the node that produced the error from the query above:
>
> DEBUG [pool-1-thread-22] 2009-08-19 16:54:57,618
> CassandraServer.java (line 221) get

> DEBUG [pool-1-thread-22] 2009-08-19 16:54:57,618 StorageProxy.java
> (line 420) strongread reading data for SliceByNamesReadCommand
> (table='MockElementLibrary', key='0401318uuuuruepwdcznr',
> columnParent='QueryPath(columnFamilyName='strings',
> superColumnName='null', columnName='null')', columns=[id,]) from 38184@null

> DEBUG [pool-1-thread-22] 2009-08-19 16:54:57,619 StorageProxy.java
> (line 427) strongread reading digest for SliceByNamesReadCommand
> (table='MockElementLibrary', key='0401318uuuuruepwdcznr',
> columnParent='QueryPath(columnFamilyName='strings',
> superColumnName='null', columnName='null')', columns=[id,]) from
> 38185@192.168.133.129:7000

>  WARN [MESSAGE-SERIALIZER-POOL:4] 2009-08-19 16:54:57,619
> MessageSerializationTask.java (line 81) Exception was generated at :
> 08/19/2009 16:54:57 on thread MESSAGE-SERIALIZER-POOL:4

>
> java.lang.NullPointerException

>         at org.apache.cassandra.net.TcpConnection.<init>
> (TcpConnection.java:83)

>         at
> org.apache.cassandra.net.TcpConnectionManager.getConnection
> (TcpConnectionManager.java:64)

>         at org.apache.cassandra.net.MessagingService.getConnection
> (MessagingService.java:306)

>         at org.apache.cassandra.net.MessageSerializationTask.run
> (MessageSerializationTask.java:66)

>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
> (ThreadPoolExecutor.java:886)

>         at java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:908)

>         at java.lang.Thread.run(Thread.java:619)
>
> DEBUG [RESPONSE-STAGE:4] 2009-08-19 16:54:57,622
> ResponseVerbHandler.java (line 38) Processing response on a callback from
> 65B1E352-A0A3-1A7F-138B-9BEA3E1D787F@192.168.133.129:7000

> ERROR [pool-1-thread-22] 2009-08-19 16:55:02,619 Cassandra.java
> (line 608) Internal error processing get

> java.lang.RuntimeException: java.util.concurrent.TimeoutException:
> Operation timed out - received only 1 responses from 192.168.133.129:7000 .

>         at
> org.apache.cassandra.service.CassandraServer.readColumnFamily
> (CassandraServer.java:100)

>         at org.apache.cassandra.service.CassandraServer.get
> (CassandraServer.java:226)

>         at org.apache.cassandra.service.Cassandra$Processor
> $get.process(Cassandra.java:602)

>         at org.apache.cassandra.service.Cassandra$Processor.process
> (Cassandra.java:560)

>         at org.apache.thrift.server.TThreadPoolServer
> $WorkerProcess.run(TThreadPoolServer.java:252)

>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
> (ThreadPoolExecutor.java:886)

>         at java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:908)

>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.util.concurrent.TimeoutException: Operation timed
> out - received only 1 responses from 192.168.133.129:7000 .

>         at org.apache.cassandra.service.QuorumResponseHandler.get
> (QuorumResponseHandler.java:86)

>         at org.apache.cassandra.service.StorageProxy.strongRead
> (StorageProxy.java:435)

>         at org.apache.cassandra.service.StorageProxy.readProtocol
> (StorageProxy.java:330)

>         at
> org.apache.cassandra.service.CassandraServer.readColumnFamily
> (CassandraServer.java:92)

>         ... 7 more
>
> It appears to me that there is a timeout during the
> QuorumResponseHandler processing, stemming from a
> NullPointerException that occurs as part of the read process. I
> suspect that this NullPointerException has something to do with the
> second DEBUG [pool-1-thread-22] comment regarding strongread ...
> from 38184@null.

>
> Does anyone know why this might be happening?

>
> Thanks for any insight,

> Phil