cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Daeschler <david.daesch...@gmail.com>
Subject Re: cassandra halt after started minutes later
Date Sun, 01 Jul 2012 16:11:23 GMT
This looks like the problem a bunch of us were having yesterday that
isn't cleared without a reboot or a date command. It seems to be
related to the leap second that was added between the 30th June and
the 1st of July.

See the mailing list thread with subject "High CPU usage as of 8pm eastern time"

If you are seeing high CPU usage and a stall after restarting
cassandra still, and you are on Linux, try:

date; date `date +"%m%d%H%M%C%y.%S"`; date;

In a terminal and see if everything starts working again.

I hope this helps.
-- 
David Daeschler



On Sun, Jul 1, 2012 at 11:33 AM, Yan Chunlu <springrider@gmail.com> wrote:
> adjust the timezone of java by  -Duser.timezone   and the timezone of
> cassandra is the same with system(Debian 6.0).
>
> after restart cassandra I found the following error message in the log file
> of node B. after about 2 minutes later, node C stop responding....
>
> the error log of node B:
>
> Thrift transport error occurred during processing of message.
> org.apache.thrift.transport.TTransportException
> at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> at
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
> at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
>
>
>
> the log info in node C:
>
>
> DEBUG [MutationStage:25] 2012-07-01 23:29:42,909 RowMutationVerbHandler.java
> (line 60) RowMutation(keyspace='spark',
> key='39373438366235383638373631353532643133393334633435326333323634373131656462306139',
> modifications=[ColumnFamily(permacache
> [76616c7565:false:67906@1341156582948365,])]) applied.  Sending response to
> 79529@/192.168.1.129
> DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 CassandraServer.java (line
> 523) insert
> DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
> 172) Mutations/ConsistencyLevel are [RowMutation(keyspace='spark',
> key='636f6d6d656e74735f706172656e74735f32373232343938',
> modifications=[ColumnFamily(permacache
> [76616c7565:false:6@1341156582953843,])])]/QUORUM
> DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
> 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to
> /192.168.1.40
> DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
> 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to
> /192.168.1.129
> DEBUG [Thread-8] 2012-07-01 23:29:42,913 IncomingTcpConnection.java (line
> 116) Version is now 3
> DEBUG [RequestResponseStage:27] 2012-07-01 23:29:42,913
> ResponseVerbHandler.java (line 44) Processing response on a callback from
> 50050@/192.168.1.129
> DEBUG [Thread-12] 2012-07-01 23:29:42,914 IncomingTcpConnection.java (line
> 116) Version is now 3
> DEBUG [RequestResponseStage:29] 2012-07-01 23:29:42,914
> ResponseVerbHandler.java (line 44) Processing response on a callback from
> 50051@/192.168.1.40
> DEBUG [Thread-11] 2012-07-01 23:29:42,939 IncomingTcpConnection.java (line
> 116) Version is now 3
>
>
>
> On Sun, Jul 1, 2012 at 11:14 PM, Yan Chunlu <springrider@gmail.com> wrote:
>>
>> I have a three node cluster running 1.0.2, today there's a very strange
>> problem that suddenly two of cassandra  node(let's say B and C) was costing
>> a lot of cpu, turned out for some reason the "java" binary just dont run....
>> I am using OpenJDK1.6.0_18, so I switched to "sun jdk", which works okay.
>>
>> after that node A stop working... same problem, I install "sun jdk", then
>> it's okay. but minutes later, B stop working again, about 5-10 minutes later
>> after the cassandra started, it stop responding connections, I can't access
>> 9160 and nodetool dont return either.
>>
>> I have turned on DEBUG and dont see much useful information, the last rows
>> on node B are as belows:
>> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
>> (line 65) resolving 2 responses
>> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
>> (line 106) digests verified
>> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
>> (line 110) resolve: 0 ms.
>> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 StorageProxy.java (line
>> 694) Read: 5 ms.
>> DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line
>> 116) Version is now 3
>> DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line
>> 116) Version is now 3
>>
>>
>> this problem is really driving me crazy since I just dont know what
>> happened, and how to debug it, I tried to kill node A and restart it, then
>> node B halt, after I restart B, then node C goes down......
>>
>>
>> one thing may related is that the log time on node B is not the same with
>> the system time(A and C are okay).
>>
>> while date on node B shows:
>> Sun Jul  1 23:10:57 CST 2012 (system time)
>>
>> but you may noticed that the time is "2012-07-01 07:45:XX" in those above
>> log message.  the system time is right, just not sure why cassandra's log
>> file shows the wrong time, I didn't recall cassandra have timezone
>> settings.....
>>
>>
>>
>>
>

Mime
View raw message