Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 89830D9C1 for ; Sun, 1 Jul 2012 15:15:54 +0000 (UTC) Received: (qmail 16542 invoked by uid 500); 1 Jul 2012 15:15:51 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 16237 invoked by uid 500); 1 Jul 2012 15:15:47 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 16226 invoked by uid 500); 1 Jul 2012 15:15:47 -0000 Delivered-To: apmail-incubator-cassandra-user@incubator.apache.org Received: (qmail 16220 invoked by uid 99); 1 Jul 2012 15:15:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Jul 2012 15:15:47 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of springrider@gmail.com designates 209.85.217.175 as permitted sender) Received: from [209.85.217.175] (HELO mail-lb0-f175.google.com) (209.85.217.175) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Jul 2012 15:15:39 +0000 Received: by lbol5 with SMTP id l5so6272097lbo.6 for ; Sun, 01 Jul 2012 08:15:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=/GlJ1TKe6RhmL7SpyB2ve8ssoTyVtz0IY386Jzl8Wqo=; b=ebpTrTh7hjkNP+StW2KO+0J2djf2vQhUUX4Z3Y2TvuyLjMAWCHrD6Pi4AVn1v1T/er obl/SrGYpUjtw32sI5DduinKRyPgQ/igzCpvZxhoOtciPVE/id12t7ElVfRmdKCYMvHF GCl1iK/OhzyaQgXY94FZI7oceiXGYGyK3zbFgF6Q1+Z5Fe3JbWWWY9Kdxs58Ns2c0Y7L nq40fMQpRCLyusFoORHipLKg4apdvU7cg7vNzmcEK90yt97ijfksk4z4m5b+HQwbR2UF APOJSOBoNE48YL1ULnH8DdMWKofcmnI7q3TJwbhwhvN2rau/KqW21oho7y0EE7xO2pb0 42zA== Received: by 10.152.148.195 with SMTP id tu3mr9506476lab.16.1341155718851; Sun, 01 Jul 2012 08:15:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.112.21.167 with HTTP; Sun, 1 Jul 2012 08:14:58 -0700 (PDT) From: Yan Chunlu Date: Sun, 1 Jul 2012 23:14:58 +0800 Message-ID: Subject: cassandra halt after started minutes later To: cassandra-user@incubator.apache.org Content-Type: multipart/alternative; boundary=e89a8f23485535ecb604c3c626d1 --e89a8f23485535ecb604c3c626d1 Content-Type: text/plain; charset=ISO-8859-1 I have a three node cluster running 1.0.2, today there's a very strange problem that suddenly two of cassandra node(let's say B and C) was costing a lot of cpu, turned out for some reason the "java" binary just dont run.... I am using OpenJDK1.6.0_18, so I switched to "sun jdk", which works okay. after that node A stop working... same problem, I install "sun jdk", then it's okay. but minutes later, B stop working again, about 5-10 minutes later after the cassandra started, it stop responding connections, I can't access 9160 and nodetool dont return either. I have turned on DEBUG and dont see much useful information, the last rows on node B are as belows: DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 65) resolving 2 responses DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 106) digests verified DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 110) resolve: 0 ms. DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 StorageProxy.java (line 694) Read: 5 ms. DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line 116) Version is now 3 DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line 116) Version is now 3 this problem is really driving me crazy since I just dont know what happened, and how to debug it, I tried to kill node A and restart it, then node B halt, after I restart B, then node C goes down...... one thing may related is that the log time on node B is not the same with the system time(A and C are okay). while date on node B shows: Sun Jul 1 23:10:57 CST 2012 (system time) but you may noticed that the time is "2012-07-01 07:45:XX" in those above log message. the system time is right, just not sure why cassandra's log file shows the wrong time, I didn't recall cassandra have timezone settings..... --e89a8f23485535ecb604c3c626d1 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I have a three node cluster running 1.0.2, today there's a very strange= problem that suddenly two of cassandra =A0node(let's say B and C) was = costing a lot of cpu, turned out for some reason the "java" binar= y just dont run.... I am using OpenJDK1.6.0_18, so I switched to "sun = jdk", which works okay.

after that node A stop working... same problem, I install "sun jdk= ", then it's okay. but minutes later, B stop working again, about = 5-10 minutes later after the cassandra started, it stop responding connecti= ons, I can't access 9160 and nodetool dont return either.

I have turned on DEBUG and dont see much useful information, the last r= ows on node B are as belows:
DEBUG [pool-2-thread-72] 2012-07-01 07:45:4= 2,830 RowDigestResolver.java (line 65) resolving 2 responses
DEBUG [pool= -2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (line 106) dig= ests verified
DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,830 RowDigestResolver.java (li= ne 110) resolve: 0 ms.
DEBUG [pool-2-thread-72] 2012-07-01 07:45:42,831 = StorageProxy.java (line 694) Read: 5 ms.
DEBUG [Thread-8] 2012-07-01 07:= 45:42,831 IncomingTcpConnection.java (line 116) Version is now 3
DEBUG [Thread-8] 2012-07-01 07:45:42,831 IncomingTcpConnection.java (line 1= 16) Version is now 3


this problem is really driving me crazy sin= ce I just dont know what happened, and how to debug it, I tried to kill nod= e A and restart it, then node B halt, after I restart B, then node C goes d= own......


one thing may related is that the log time on= node B is not the same with the system time(A and C are okay).
<= br>
while date on node B shows:
Sun Jul =A01 23:10= :57 CST 2012 (system time)

but you may noticed that the time is "2012-07-01 0= 7:45:XX" in those above log message. =A0the system time is right, just= not sure why cassandra's log file shows the wrong time, I didn't r= ecall cassandra have timezone settings.....




--e89a8f23485535ecb604c3c626d1--