Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 49152 invoked from network); 27 Oct 2010 14:15:03 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 Oct 2010 14:15:03 -0000 Received: (qmail 60061 invoked by uid 500); 27 Oct 2010 14:15:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 60009 invoked by uid 500); 27 Oct 2010 14:15:01 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 59997 invoked by uid 99); 27 Oct 2010 14:15:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Oct 2010 14:15:00 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of daniel.doubleday@gmx.net designates 213.165.64.22 as permitted sender) Received: from [213.165.64.22] (HELO mail.gmx.net) (213.165.64.22) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 27 Oct 2010 14:14:52 +0000 Received: (qmail invoked by alias); 27 Oct 2010 14:14:32 -0000 Received: from edge.smeet.de (EHLO caladan.smeet.de) [87.234.38.178] by mail.gmx.net (mp010) with SMTP; 27 Oct 2010 16:14:32 +0200 X-Authenticated: #3445653 X-Provags-ID: V01U2FsdGVkX18/v1ZbR6Q/P+NIjLxnxGGt4BwsyPQy0Ye+p14a08 aLWq4wtH9lEKPr Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1081) Subject: Re: High BloomFilterFalseRation From: Daniel Doubleday In-Reply-To: Date: Wed, 27 Oct 2010 16:14:31 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1081) X-Y-GMX-Trusted: 0 X-Virus-Checked: Checked by ClamAV on apache.org Hm - not sure if I understand the random question. We are using RP. But I = wouldn't know why that should matter. I thought that the bloom filter hash function should evenly distribute = no matter what keys come in. =20 Keys are '/' separated strings (aka paths :-)) I do bulk inserts like: (1000 rows at a time, with ~ 50 cols each) [ {'a/b/foo': cols}, {'a/b/bar': cols}, {'a/b/baz': cols} ] and before that I would query for 'a/b'. Recursively as in mkdir -p If parent paths are missing they would be inserted with the bulk insert. The value for BloomFilterFalseRatio has been in the range of 0.19 - 0.59 = in the last couple of hours. Mostly around 0.3 We're on 0.6.6 btw On Oct 27, 2010, at 3:58 PM, Jonathan Ellis wrote: > This is not expected, no. How random are your queries? If you have a > couple outlier rows causing the false positives that are being queried > over and over then that could just be the luck of the draw. >=20 > On Wed, Oct 27, 2010 at 5:24 AM, Daniel Doubleday > wrote: >> Hi people >>=20 >> We are currently moving our second use case from mysql to cassandra. = While importing the data (ongoing) I noticed that the = BloomFilterFalseRation seems to be pretty high compared to another CF = which is in used in production right now. >>=20 >> Its a hierarchical data model and I cannot avoid to do a read before = inserting multiple columns. >>=20 >> I see a false positive ration of 0.28 while in my other CF it is = 0.00025. >>=20 >> The CF has 5 live sstables whiel I read that ratio. At that time I = inserted ~ 200k rows with a total of 1M cols. Row keys are pretty large = unfortunately (key.length() ~ 60) >>=20 >> Just wanted to check if this value is to be expected. >>=20 >>=20 >>=20 >> Thanks, >> Daniel >=20 >=20 >=20 > --=20 > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com