Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 393F5D54A for ; Sun, 1 Jul 2012 16:11:56 +0000 (UTC) Received: (qmail 39145 invoked by uid 500); 1 Jul 2012 16:11:54 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 39122 invoked by uid 500); 1 Jul 2012 16:11:54 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 39114 invoked by uid 99); 1 Jul 2012 16:11:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Jul 2012 16:11:53 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of david.daeschler@gmail.com designates 74.125.82.44 as permitted sender) Received: from [74.125.82.44] (HELO mail-wg0-f44.google.com) (74.125.82.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Jul 2012 16:11:46 +0000 Received: by wgbdr13 with SMTP id dr13so3571627wgb.25 for ; Sun, 01 Jul 2012 09:11:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=QmreC8D4T+SlQ/k7Tkgib/24wPeQvUZj1k3N97NQhvU=; b=n61ZmMRDQ+OagT48zy2tzl6YbiFzC0yVRc468hbD5JUvFsJ+nMdWktOkxQapqMiBHR +vDXT1FQLNjCb7do4oRA08qoKhL2Dh/0k5XtNqz0KmV5KCBF+raitzEEY2iuxckA7CFp XqebHkcJQpI0K4/fay1TATFAxXJR2CBKDP6j32JV6ZK5rJARojCeaAcIL9BvR6H8XUU/ 9FTBvJnTFp3/M8VijdFtyeXvWFMkwh9WkTyWGvaGNbOvg0xy+H0LSX1AOwC08Fy3jWRu KBZi3c5UtgLKB32RpAI8W9D/i+0J6K4SkZduRE0LBji7Vp1zxZZ9Y/mW0gvPFiEtaTXg R2KQ== MIME-Version: 1.0 Received: by 10.216.228.232 with SMTP id f82mr428482weq.211.1341159083415; Sun, 01 Jul 2012 09:11:23 -0700 (PDT) Received: by 10.180.107.69 with HTTP; Sun, 1 Jul 2012 09:11:23 -0700 (PDT) In-Reply-To: References: Date: Sun, 1 Jul 2012 12:11:23 -0400 Message-ID: Subject: Re: cassandra halt after started minutes later From: David Daeschler To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 This looks like the problem a bunch of us were having yesterday that isn't cleared without a reboot or a date command. It seems to be related to the leap second that was added between the 30th June and the 1st of July. See the mailing list thread with subject "High CPU usage as of 8pm eastern time" If you are seeing high CPU usage and a stall after restarting cassandra still, and you are on Linux, try: date; date `date +"%m%d%H%M%C%y.%S"`; date; In a terminal and see if everything starts working again. I hope this helps. -- David Daeschler On Sun, Jul 1, 2012 at 11:33 AM, Yan Chunlu wrote: > adjust the timezone of java by -Duser.timezone and the timezone of > cassandra is the same with system(Debian 6.0). > > after restart cassandra I found the following error message in the log file > of node B. after about 2 minutes later, node C stop responding.... > > the error log of node B: > > Thrift transport error occurred during processing of message. > org.apache.thrift.transport.TTransportException > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) > at > org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) > at > org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) > at > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) > at > org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > > > the log info in node C: > > > DEBUG [MutationStage:25] 2012-07-01 23:29:42,909 RowMutationVerbHandler.java > (line 60) RowMutation(keyspace='spark', > key='39373438366235383638373631353532643133393334633435326333323634373131656462306139', > modifications=[ColumnFamily(permacache > [76616c7565:false:67906@1341156582948365,])]) applied. Sending response to > 79529@/192.168.1.129 > DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 CassandraServer.java (line > 523) insert > DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line > 172) Mutations/ConsistencyLevel are [RowMutation(keyspace='spark', > key='636f6d6d656e74735f706172656e74735f32373232343938', > modifications=[ColumnFamily(permacache > [76616c7565:false:6@1341156582953843,])])]/QUORUM > DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line > 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to > /192.168.1.40 > DEBUG [pool-2-thread-209] 2012-07-01 23:29:42,913 StorageProxy.java (line > 301) insert writing key 636f6d6d656e74735f706172656e74735f32373232343938 to > /192.168.1.129 > DEBUG [Thread-8] 2012-07-01 23:29:42,913 IncomingTcpConnection.java (line > 116) Version is now 3 > DEBUG [RequestResponseStage:27] 2012-07-01 23:29:42,913 > ResponseVerbHandler.java (line 44) Processing response on a callback from > 50050@/192.168.1.129 > DEBUG [Thread-12] 2012-07-01 23:29:42,914 IncomingTcpConnection.java (line > 116) Version is now 3 > DEBUG [RequestResponseStage:29] 2012-07-01 23:29:42,914 > ResponseVerbHandler.java (line 44) Processing response on a callback from > 50051@/192.168.1.40 > DEBUG [Thread-11] 2012-07-01 23:29:42,939 IncomingTcpConnection.java (line > 116) Version is now 3 > > > > On Sun, Jul 1, 2012 at 11:14 PM, Yan Chunlu wrote: >> >> I have a three node cluster running 1.0.2, today there's a very strange >> problem that suddenly two of cassandra node(let's say B and C) was costing >> a lot of cpu, turned out for some reason the "java" binary just dont run.... >> I am using OpenJDK1.6.0_18, so I switched to "sun jdk", which works okay. >> >> after that node A stop working... same problem, I install "sun jdk", then >> it's okay. but minutes later, B stop working again, about 5-10 minutes later >> after the cassandra started, it stop responding connections, I can't access >> 9160 and nodetool dont return either. >> >> I have turned on DEBUG and dont see much useful information, the last rows >> on node B are as belows: >> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java >> (line 65) resolving 2 responses >> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java >> (line 106) digests verified >> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java >> (line 110) resolve: 0 ms. >> DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 StorageProxy.java (line >> 694) Read: 5 ms. >> DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line >> 116) Version is now 3 >> DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line >> 116) Version is now 3 >> >> >> this problem is really driving me crazy since I just dont know what >> happened, and how to debug it, I tried to kill node A and restart it, then >> node B halt, after I restart B, then node C goes down...... >> >> >> one thing may related is that the log time on node B is not the same with >> the system time(A and C are okay). >> >> while date on node B shows: >> Sun Jul 1 23:10:57 CST 2012 (system time) >> >> but you may noticed that the time is "2012-07-01 07:45:XX" in those above >> log message. the system time is right, just not sure why cassandra's log >> file shows the wrong time, I didn't recall cassandra have timezone >> settings..... >> >> >> >> >