cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Chunlu <springri...@gmail.com>
Subject Re: cassandra halt after started minutes later
Date Sun, 01 Jul 2012 16:41:54 GMT
huge great thanks!!!!  it is the leap second problem!

finally I can go to bed....

On Mon, Jul 2, 2012 at 12:11 AM, David Daeschler
<david.daeschler@gmail.com>wrote:

> This looks like the problem a bunch of us were having yesterday that
> isn't cleared without a reboot or a date command. It seems to be
> related to the leap second that was added between the 30th June and
> the 1st of July.
>
> See the mailing list thread with subject "High CPU usage as of 8pm eastern
> time"
>
> If you are seeing high CPU usage and a stall after restarting
> cassandra still, and you are on Linux, try:
>
> date; date `date +"%m%d%H%M%C%y.%S"`; date;
>
> In a terminal and see if everything starts working again.
>
> I hope this helps.
> --
> David Daeschler
>
>
>
> On Sun, Jul 1, 2012 at 11:33 AM, Yan Chunlu <springrider@gmail.com> wrote:
> > adjust the timezone of java by  -Duser.timezone   and the timezone of
> > cassandra is the same with system(Debian 6.0).
> >
> > after restart cassandra I found the following error message in the log
> file
> > of node B. after about 2 minutes later, node C stop responding....
> >
> > the error log of node B:
> >
> > Thrift transport error occurred during processing of message.
> > org.apache.thrift.transport.TTransportException
> > at
> >
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> > at
> >
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> > at
> >
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> > at
> >
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> > at
> >
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> > at
> >
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> > at
> >
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
> > at
> >
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > at java.lang.Thread.run(Thread.java:662)
> >
> >
> >
> > the log info in node C:
> >
> >
> > DEBUG [MutationStage:25] 2012-07-01 23:29:42,909
> RowMutationVerbHandler.java
> > (line 60) RowMutation(keyspace='spark',
> >
> key='39373438366235383638373631353532643133393334633435326333323634373131656462306139',
> > modifications=[ColumnFamily(permacache
> > [76616c7565:false:67906@1341156582948365,])]) applied.  Sending
> response to
> > 79529@/192.168.1.129
> > DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 CassandraServer.java
> (line
> > 523) insert
> > DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
> > 172) Mutations/ConsistencyLevel are [RowMutation(keyspace='spark',
> > key='636f6d6d656e74735f706172656e74735f32373232343938',
> > modifications=[ColumnFamily(permacache
> > [76616c7565:false:6@1341156582953843,])])]/QUORUM
> > DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
> > 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938
> to
> > /192.168.1.40
> > DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line
> > 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938
> to
> > /192.168.1.129
> > DEBUG [Thread-8] 2012-07-01 23:29:42,913 IncomingTcpConnection.java (line
> > 116) Version is now 3
> > DEBUG [RequestResponseStage:27] 2012-07-01 23:29:42,913
> > ResponseVerbHandler.java (line 44) Processing response on a callback from
> > 50050@/192.168.1.129
> > DEBUG [Thread-12] 2012-07-01 23:29:42,914 IncomingTcpConnection.java
> (line
> > 116) Version is now 3
> > DEBUG [RequestResponseStage:29] 2012-07-01 23:29:42,914
> > ResponseVerbHandler.java (line 44) Processing response on a callback from
> > 50051@/192.168.1.40
> > DEBUG [Thread-11] 2012-07-01 23:29:42,939 IncomingTcpConnection.java
> (line
> > 116) Version is now 3
> >
> >
> >
> > On Sun, Jul 1, 2012 at 11:14 PM, Yan Chunlu <springrider@gmail.com>
> wrote:
> >>
> >> I have a three node cluster running 1.0.2, today there's a very strange
> >> problem that suddenly two of cassandra  node(let's say B and C) was
> costing
> >> a lot of cpu, turned out for some reason the "java" binary just dont
> run....
> >> I am using OpenJDK1.6.0_18, so I switched to "sun jdk", which works
> okay.
> >>
> >> after that node A stop working... same problem, I install "sun jdk",
> then
> >> it's okay. but minutes later, B stop working again, about 5-10 minutes
> later
> >> after the cassandra started, it stop responding connections, I can't
> access
> >> 9160 and nodetool dont return either.
> >>
> >> I have turned on DEBUG and dont see much useful information, the last
> rows
> >> on node B are as belows:
> >> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
> >> (line 65) resolving 2 responses
> >> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
> >> (line 106) digests verified
> >> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java
> >> (line 110) resolve: 0 ms.
> >> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 StorageProxy.java (line
> >> 694) Read: 5 ms.
> >> DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java
> (line
> >> 116) Version is now 3
> >> DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java
> (line
> >> 116) Version is now 3
> >>
> >>
> >> this problem is really driving me crazy since I just dont know what
> >> happened, and how to debug it, I tried to kill node A and restart it,
> then
> >> node B halt, after I restart B, then node C goes down......
> >>
> >>
> >> one thing may related is that the log time on node B is not the same
> with
> >> the system time(A and C are okay).
> >>
> >> while date on node B shows:
> >> Sun Jul  1 23:10:57 CST 2012 (system time)
> >>
> >> but you may noticed that the time is "2012-07-01 07:45:XX" in those
> above
> >> log message.  the system time is right, just not sure why cassandra's
> log
> >> file shows the wrong time, I didn't recall cassandra have timezone
> >> settings.....
> >>
> >>
> >>
> >>
> >
>

Mime
View raw message