Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 29176 invoked from network); 4 May 2010 20:51:15 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 May 2010 20:51:15 -0000 Received: (qmail 51111 invoked by uid 500); 4 May 2010 20:51:14 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 51091 invoked by uid 500); 4 May 2010 20:51:14 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 51083 invoked by uid 99); 4 May 2010 20:51:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 May 2010 20:51:14 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jbellis@gmail.com designates 74.125.82.172 as permitted sender) Received: from [74.125.82.172] (HELO mail-wy0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 May 2010 20:51:08 +0000 Received: by wyb32 with SMTP id 32so948404wyb.31 for ; Tue, 04 May 2010 13:50:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=ciIQCJo3Or0vj54kBHEpATP9SoGedYEQRBHBOfMsjy4=; b=AgNYFZGnI/aG4TgNHzyKu0wF8CUnBYS6eAXqkZwoh88KUYq67ChTNrNUPmr2uoWLtu HPeKlbpd7qbv10NMiKIxzeEba7C+wDltAwOYZ0shjenbak5IAQk+x0ICObWu3uNamcOy 3j7FEt9WaJv+zyKIFLXEdA7JwrAG4cV7ZowPc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=quuNhJfaSbW5x/6gqMq0M8XCSukWL3PiTBJfy/bbF6n88xiV4sLIjJFvpNc4j7Z+gx PlDUTLJqRYqF906wVlDrrDhsPk1Uabr3WJ/HMDDBs4I3SeRViQL5v1Hs41IcOxo89dGQ Bo0D+0maJ6PVyy1khs24N1bULHgi/KREb+HEg= Received: by 10.216.188.144 with SMTP id a16mr2885346wen.204.1273006248230; Tue, 04 May 2010 13:50:48 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.22.10 with HTTP; Tue, 4 May 2010 13:50:28 -0700 (PDT) In-Reply-To: References: From: Jonathan Ellis Date: Tue, 4 May 2010 15:50:28 -0500 Message-ID: Subject: Re: BloomFilter is taking too much memory To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org BloomFilter is not redundant, because it stores information about _all_ keys while the index summary stores every 1/128 key. On Tue, May 4, 2010 at 3:47 PM, Weijun Li wrote: > Hello, > > We stored about 47mil keys in one Cassandra node and what a memory dump > shows for one of the SStableReader: > > =A0=A0=A0 SSTableReader: 386MB. Among this 386MB, IndexSummary takes abou= t 231MB > but BloomFilter takes 155MB with an embedded huge array long[19.4mil]. > > It seems that BloomFilter is taking too much memory. If this is the case > BloomFilter seems to be redundant comparing to the size of index. > > So is this desired behavior? Is there a formula to estimate the size of > needed memory for BloomFilter? > > Thanks, > > -Weijun > > --=20 Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com