Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0ED7B88DE for ; Tue, 16 Aug 2011 21:59:54 +0000 (UTC) Received: (qmail 54642 invoked by uid 500); 16 Aug 2011 21:59:51 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 54560 invoked by uid 500); 16 Aug 2011 21:59:50 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 54539 invoked by uid 99); 16 Aug 2011 21:59:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Aug 2011 21:59:50 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tholzer@wetafx.co.nz designates 110.232.144.26 as permitted sender) Received: from [110.232.144.26] (HELO meera.wetafx.co.nz) (110.232.144.26) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Aug 2011 21:59:46 +0000 Received: from localhost (localhost [127.0.0.1]) by meera.wetafx.co.nz (Postfix) with ESMTP id 2D95D9EF0027; Wed, 17 Aug 2011 09:59:24 +1200 (NZST) X-Virus-Scanned: with amavisd-new by meera at meera.wetafx.co.nz Received: from meera.wetafx.co.nz ([127.0.0.1]) by localhost (meera.wetafx.co.nz [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7cEoFNE6gjCn; Wed, 17 Aug 2011 09:59:22 +1200 (NZST) Received: from jupiter036.localdomain (webmail.wetafx.co.nz [192.168.120.66]) by meera.wetafx.co.nz (Postfix) with ESMTP id 13C7C9EF001F; Wed, 17 Aug 2011 09:59:22 +1200 (NZST) Received: from localhost (localhost [127.0.0.1]) by jupiter036.localdomain (Postfix) with ESMTP id 075DB5BE979; Wed, 17 Aug 2011 09:59:22 +1200 (NZST) X-Virus-Scanned: amavisd-new at wetafx.co.nz Received: from jupiter036.localdomain ([127.0.0.1]) by localhost (smtp-digi.wetafx.co.nz [127.0.0.1]) (amavisd-new, port 10024) with LMTP id gLyVUyly6Nl5; Wed, 17 Aug 2011 09:59:21 +1200 (NZST) Received: from [192.168.49.114] (boarshead.wetafx.co.nz [192.168.49.114]) by jupiter036.localdomain (Postfix) with ESMTP id DFDA65BE967; Wed, 17 Aug 2011 09:59:21 +1200 (NZST) Message-ID: <4E4AE839.2010403@wetafx.co.nz> Date: Wed, 17 Aug 2011 09:59:21 +1200 From: Teijo Holzer User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: user@cassandra.apache.org CC: Yan Chunlu Subject: Re: node restart taking too long References: <3066FEE2-CE8D-4B1D-BEB9-75812BAFE9F7@thelastpickle.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi, yes, we saw exactly the same messages. We got rid of these by doing the following: * Set all row & key caches in your CFs to 0 via cassandra-cli * Kill Cassandra * Remove all files in the saved_caches directory * Start Cassandra * Slowly bring back row & key caches (if desired, we left them off) Cheers, T. On 16/08/11 23:35, Yan Chunlu wrote: > I saw alot slicequeryfilter things if changed the log level to DEBUG. just > thought even bring up a new node will be faster than start the old one..... it > is wired > > DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line 123) > collecting 0 of 2147483647: 76616c7565:false:225@1313068845474382 > DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line 123) > collecting 0 of 2147483647: 76616c7565:false:453@1310999270198313 > DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line 123) > collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827 > DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line 123) > collecting 0 of 2147483647: 76616c7565:false:157@1313097239332314 > DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line 123) > collecting 0 of 2147483647: 76616c7565:false:41729@1313190821826229 > DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line 123) > collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203 > DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line 123) > collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907 > DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java (line 123) > collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005 > DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java (line 123) > collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155 > DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java (line 123) > collecting 0 of 2147483647: 76616c7565:false:621@1313192538616112 > > > > On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu > wrote: > > but it seems the row cache is cluster wide, how will the change of row > cache affect the read speed? > > > On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis > wrote: > > Or leave row cache enabled but disable cache saving (and remove the > one already on disk). > > On Sun, Aug 14, 2011 at 5:05 PM, aaron morton > wrote: > > INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547) > > completed loading (1744370 ms; 200000 keys) row cache for COMMENT > > > > It's taking 29 minutes to load 200,000 rows in the row cache. Thats a > > pretty big row cache, I would suggest reducing or disabling it. > > Background > http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra > > > > and server can not afford the load then crashed. after come back, > node 3 can > > not return for more than 96 hours > > > > Crashed how ? > > You may be seeing https://issues.apache.org/jira/browse/CASSANDRA-2280 > > Watch nodetool compactionstats to see when the Merkle tree build > finishes > > and nodetool netstats to see which CF's are streaming. > > Cheers > > ----------------- > > Aaron Morton > > Freelance Cassandra Developer > > @aaronmorton > > http://www.thelastpickle.com > > On 15 Aug 2011, at 04:23, Yan Chunlu wrote: > > > > > > I got 3 nodes and RF=3, when I repairing ndoe3, it seems alot data > > generated. and server can not afford the load then crashed. > > after come back, node 3 can not return for more than 96 hours > > > > for 34GB data, the node 2 could restart and back online within 1 hour. > > > > I am not sure what's wrong with node3 and should I restart node 3 again? > > thanks! > > > > Address Status State Load Owns Token > > > > 113427455640312821154458202477256070484 > > node1 Up Normal 34.11 GB 33.33% 0 > > node2 Up Normal 31.44 GB 33.33% > > 56713727820156410577229101238628035242 > > node3 Down Normal 177.55 GB 33.33% > > 113427455640312821154458202477256070484 > > > > > > the log shows it is still going on, not sure why it is so slow: > > > > > > INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java (line 154) > Opening > > /cassandra/data/COMMENT > > INFO [main] 2011-08-14 08:55:47,828 ColumnFamilyStore.java (line 275) > > reading saved cache /cassandra/saved_caches/COMMENT-RowCache > > INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547) > > completed loading (1744370 ms; 200000 keys) row cache for COMMENT > > INFO [main] 2011-08-14 09:24:52,299 ColumnFamilyStore.java (line 275) > > reading saved cache /cassandra/saved_caches/COMMENT-RowCache > > INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480 > CacheWriter.java (line > > 96) Saved COMMENT-RowCache (200000 items) in 2535 ms > > > > > > > > > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > > >