Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6906789BC for ; Sun, 14 Aug 2011 22:05:52 +0000 (UTC) Received: (qmail 8697 invoked by uid 500); 14 Aug 2011 22:05:50 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 8595 invoked by uid 500); 14 Aug 2011 22:05:49 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 8587 invoked by uid 99); 14 Aug 2011 22:05:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Aug 2011 22:05:49 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a47.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Aug 2011 22:05:43 +0000 Received: from homiemail-a47.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a47.g.dreamhost.com (Postfix) with ESMTP id DF701284057 for ; Sun, 14 Aug 2011 15:05:17 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=t2AY4YKtqJ uNbpio8eYqHfrQONPlb67HqIHOwcsHtmnl5FCaGpt5N6fPW5ENgbNAvKaKuxNgUI DUCndVlsULjvS4rLRrXqFkvSsQEYYb5/Cpzfdvyn2xZ2QsJV6L4GrBtzns6szkZA FGAKsyx7BURIcLtdZxLBlLO/TBQ99iJD4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=7EoZPlD8ypH7LCt8 jfRS9hAB0Ug=; b=jm5sJIG3e/TUT+kXPyFEy6+IQ4mXTMbZUYXC1bf97OQm6naM 4j9K+MYqAsSsHzfp1F1QcHMSJMlAAnH8MOgzW9+Bd/vjqsKfh5yHTv/fa/CLIIF4 lPWLgUeGJLMZdKBwQGgbgfQQbgWrbrRg+ndzhD6mBPugX79EIG222wh2zxY= Received: from 202-126-206-214.vectorcommunications.net.nz (unknown [202.126.206.214]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a47.g.dreamhost.com (Postfix) with ESMTPSA id 73B3E284058 for ; Sun, 14 Aug 2011 15:05:17 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1244.3) Content-Type: multipart/alternative; boundary="Apple-Mail=_289F1F59-DC18-4B51-A13C-9B33668B272F" Subject: Re: node restart taking too long Date: Mon, 15 Aug 2011 10:05:17 +1200 In-Reply-To: To: user@cassandra.apache.org References: Message-Id: <3066FEE2-CE8D-4B1D-BEB9-75812BAFE9F7@thelastpickle.com> X-Mailer: Apple Mail (2.1244.3) --Apple-Mail=_289F1F59-DC18-4B51-A13C-9B33668B272F Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 > INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547) = completed loading (1744370 ms; 200000 keys) row cache for COMMENT It's taking 29 minutes to load 200,000 rows in the row cache. Thats a = pretty big row cache, I would suggest reducing or disabling it.=20 Background = http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra > and server can not afford the load then crashed. after come back, node = 3 can not return for more than 96 hours Crashed how ? You may be seeing https://issues.apache.org/jira/browse/CASSANDRA-2280=20= Watch nodetool compactionstats to see when the Merkle tree build = finishes and nodetool netstats to see which CF's are streaming.=20 Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 15 Aug 2011, at 04:23, Yan Chunlu wrote: >=20 >=20 > I got 3 nodes and RF=3D3, when I repairing ndoe3, it seems alot data = generated. and server can not afford the load then crashed. > after come back, node 3 can not return for more than 96 hours >=20 > for 34GB data, the node 2 could restart and back online within 1 hour. >=20 > I am not sure what's wrong with node3 and should I restart node 3 = again? thanks! >=20 > Address Status State Load Owns Token > = 113427455640312821154458202477256070484 > node1 Up Normal 34.11 GB 33.33% 0 > node2 Up Normal 31.44 GB 33.33% = 56713727820156410577229101238628035242 > node3 Down Normal 177.55 GB 33.33% = 113427455640312821154458202477256070484 >=20 >=20 > the log shows it is still going on, not sure why it is so slow: >=20 >=20 > INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java (line 154) = Opening /cassandra/data/COMMENT > INFO [main] 2011-08-14 08:55:47,828 ColumnFamilyStore.java (line 275) = reading saved cache /cassandra/saved_caches/COMMENT-RowCache > INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547) = completed loading (1744370 ms; 200000 keys) row cache for COMMENT > INFO [main] 2011-08-14 09:24:52,299 ColumnFamilyStore.java (line 275) = reading saved cache /cassandra/saved_caches/COMMENT-RowCache > INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480 CacheWriter.java = (line 96) Saved COMMENT-RowCache (200000 items) in 2535 ms >=20 >=20 >=20 >=20 --Apple-Mail=_289F1F59-DC18-4B51-A13C-9B33668B272F Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1  INFO [main] 2011-08-14 09:24:52,198 = ColumnFamilyStore.java (line 547) completed loading (1744370 ms; 200000 = keys) row cache for COMMENTIt's taking 29 minutes to load = 200,000 rows in the  row cache. Thats a pretty big row cache, I = would suggest reducing or disabling it. 

and server can = not afford the load then crashed. after come back, node 3 can not = return for more than 96 hours
Crashed how = ?

Watch = nodetool compactionstats to see when the Merkle tree build finishes and = nodetool netstats to see which CF's are = streaming. 

Cheers

http://www.thelastpickle.com

On 15 Aug 2011, at 04:23, Yan Chunlu wrote:



I = got 3 nodes and RF=3D3, when I repairing ndoe3, it seems alot data = generated.  and server can not afford the load then = crashed.
after come back, node 3 can not return for more than 96 = hours

for 34GB data, the node 2 could restart and back online = within 1 hour.

I am not sure what's wrong with node3 and should I restart node 3 = again? = thanks!

Address         = Status State   = Load            = Owns    = Token
           = ;            &= nbsp;           &nb= sp;            = ;       = 113427455640312821154458202477256070484
node1     Up     Normal  34.11 = GB        33.33%  0
node2 =     Up     Normal  31.44 = GB        33.33%  = 56713727820156410577229101238628035242
node3     = Down   Normal  177.55 = GB       33.33%  = 113427455640312821154458202477256070484


the log shows it is still going on, not sure why it is so = slow:


 INFO [main] 2011-08-14 08:55:47,734 = SSTableReader.java (line 154) Opening = /cassandra/data/COMMENT
 INFO [main] 2011-08-14 08:55:47,828 = ColumnFamilyStore.java (line 275) reading saved cache = /cassandra/saved_caches/COMMENT-RowCache
 INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line = 547) completed loading (1744370 ms; 200000 keys) row cache for = COMMENT
 INFO [main] 2011-08-14 09:24:52,299 = ColumnFamilyStore.java (line 275) reading saved cache = /cassandra/saved_caches/COMMENT-RowCache
 INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480 = CacheWriter.java (line 96) Saved COMMENT-RowCache (200000 items) in 2535 = ms





= --Apple-Mail=_289F1F59-DC18-4B51-A13C-9B33668B272F--