adjust the timezone of java by  -Duser.timezone   and the timezone of cassandra is the same with system(Debian 6.0).

after restart cassandra I found the following error message in the log file of node B. after about 2 minutes later, node C stop responding....

the error log of node B:

Thrift transport error occurred during processing of message.
org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)



the log info in node C:


DEBUG [MutationStage:25] 2012-07-01 23:29:42,909 RowMutationVerbHandler.java (line 60) RowMutation(keyspace='spark', key='39373438366235383638373631353532643133393334633435326333323634373131656462306139', modifications=[ColumnFamily(permacache [76616c7565:false:67906@1341156582948365,])]) applied.  Sending response to 79529@/192.168.1.129
DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 CassandraServer.java (line 523) insert
DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line 172) Mutations/ConsistencyLevel are [RowMutation(keyspace='spark', key='636f6d6d656e74735f706172656e74735f32373232343938', modifications=[ColumnFamily(permacache [76616c7565:false:6@1341156582953843,])])]/QUORUM
DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to /192.168.1.40
DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to /192.168.1.129
DEBUG [Thread-8] 2012-07-01 23:29:42,913 IncomingTcpConnection.java (line 116) Version is now 3
DEBUG [RequestResponseStage:27] 2012-07-01 23:29:42,913 ResponseVerbHandler.java (line 44) Processing response on a callback from 50050@/192.168.1.129
DEBUG [Thread-12] 2012-07-01 23:29:42,914 IncomingTcpConnection.java (line 116) Version is now 3
DEBUG [RequestResponseStage:29] 2012-07-01 23:29:42,914 ResponseVerbHandler.java (line 44) Processing response on a callback from 50051@/192.168.1.40
DEBUG [Thread-11] 2012-07-01 23:29:42,939 IncomingTcpConnection.java (line 116) Version is now 3



On Sun, Jul 1, 2012 at 11:14 PM, Yan Chunlu <springrider@gmail.com> wrote:
I have a three node cluster running 1.0.2, today there's a very strange problem that suddenly two of cassandra  node(let's say B and C) was costing a lot of cpu, turned out for some reason the "java" binary just dont run.... I am using OpenJDK1.6.0_18, so I switched to "sun jdk", which works okay.

after that node A stop working... same problem, I install "sun jdk", then it's okay. but minutes later, B stop working again, about 5-10 minutes later after the cassandra started, it stop responding connections, I can't access 9160 and nodetool dont return either.

I have turned on DEBUG and dont see much useful information, the last rows on node B are as belows:
DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 65) resolving 2 responses
DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 106) digests verified
DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 110) resolve: 0 ms.
DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 StorageProxy.java (line 694) Read: 5 ms.
DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line 116) Version is now 3
DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line 116) Version is now 3


this problem is really driving me crazy since I just dont know what happened, and how to debug it, I tried to kill node A and restart it, then node B halt, after I restart B, then node C goes down......


one thing may related is that the log time on node B is not the same with the system time(A and C are okay).

while date on node B shows:
Sun Jul  1 23:10:57 CST 2012 (system time)

but you may noticed that the time is "2012-07-01 07:45:XX" in those above log message.  the system time is right, just not sure why cassandra's log file shows the wrong time, I didn't recall cassandra have timezone settings.....