Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8BEC9106F7 for ; Sun, 21 Jul 2013 20:39:25 +0000 (UTC) Received: (qmail 73126 invoked by uid 500); 21 Jul 2013 20:39:23 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 73097 invoked by uid 500); 21 Jul 2013 20:39:23 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 73089 invoked by uid 99); 21 Jul 2013 20:39:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Jul 2013 20:39:23 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a51.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Jul 2013 20:39:18 +0000 Received: from homiemail-a51.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a51.g.dreamhost.com (Postfix) with ESMTP id 8DF6B2E805C; Sun, 21 Jul 2013 13:38:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h= content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; s= thelastpickle.com; bh=xjzNsVsztbMeoPz62nOXNNBPQgo=; b=AwV5WxB8Nx FEUJCHodQ3GC1XxY4f+k2PpBjEjdcM01QK8GpNb/fs439gHLh3mbUPyg5elHakJf P484Q9TBieovR2EMvWdo1NyOjjDnVjUUo+UvQa6tm79fIlvv/cCNimEW6DMoNj84 aFTEsbobM9gaXUovE7Gblsn7Yy/wL+uXU= Received: from [172.16.1.7] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a51.g.dreamhost.com (Postfix) with ESMTPSA id EB6292E8057; Sun, 21 Jul 2013 13:38:56 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: CL1 and CLQ with 5 nodes cluster and 3 alives node From: aaron morton In-Reply-To: <1995238331.14946261374210175003.JavaMail.defaultUser@defaultHost> Date: Mon, 22 Jul 2013 08:38:57 +1200 Content-Transfer-Encoding: quoted-printable Message-Id: <63197877-3B72-4261-AAB7-ABBC5F028A17@thelastpickle.com> References: <1995238331.14946261374210175003.JavaMail.defaultUser@defaultHost> To: user@cassandra.apache.org, "cbertu81@libero.it" X-Mailer: Apple Mail (2.1508) X-Virus-Checked: Checked by ClamAV on apache.org > I'm experiencing some problems after 3 years of cassandra in = production (from=20 > 0.6 to 1.0.6) -- for 2 times in 3 weeks 2 nodes crashed with = OutOfMemory=20 > Exception. Take a look at how many rows you have and the size of the bloom filters. = You may have grown :) If you have more than 500Million rows you may want to check the = bloom_filter_fp_chance, the old default was 0.000744 and the new (post = 1.) number is 0.01 for sized tiered.=20 > Now a question -- why with 2 nodes offline all my application stop = providing=20 > the service, even when a Consistency Level One read is invoked? > I'd expected this behaviour: What error did the client get and what client are you using ?=20 it also depends on if/how the node fails. The later versions try to shut = down when there is an OOM, not sure what 1.0 does.=20 Is the node went into a zombie state the clients may have been timing = out. The should then move onto to another node.=20 If it had started shutting down the client should have gotten some = immediate errors.=20 Cheers ----------------- Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 19/07/2013, at 5:02 PM, cbertu81@libero.it wrote: > Hi all, > I'm experiencing some problems after 3 years of cassandra in = production (from=20 > 0.6 to 1.0.6) -- for 2 times in 3 weeks 2 nodes crashed with = OutOfMemory=20 > Exception. > In the log I can read the warn about the few heap available ... now = I'm=20 > increasing a little bit my RAM, my Java Heap (1/4 of the RAM) and = reducing the=20 > size of rows and memtables thresholds. Other tips? >=20 > Now a question -- why with 2 nodes offline all my application stop = providing=20 > the service, even when a Consistency Level One read is invoked? > I'd expected this behaviour: >=20 > CL1 operations keep working > more than 80% of CLQ operations working (nodes offline where 2 and 5 = in a=20 > clockwise key distribution only writes to fifth node should impact to = node 2) > most of all CLALL operations (that I don't use) failing >=20 > The situation instead was that I had ALL services stop responding = throwing a=20 > TTransportException ... >=20 > Thanks in advance >=20 > Carlo