Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 13262 invoked from network); 6 Dec 2010 01:14:36 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 6 Dec 2010 01:14:36 -0000 Received: (qmail 38059 invoked by uid 500); 6 Dec 2010 01:14:36 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 38014 invoked by uid 500); 6 Dec 2010 01:14:36 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 38006 invoked by uid 99); 6 Dec 2010 01:14:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Dec 2010 01:14:36 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Dec 2010 01:14:33 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oB61ECnY012952 for ; Mon, 6 Dec 2010 01:14:12 GMT Message-ID: <32601942.121261291598052406.JavaMail.jira@thor> Date: Sun, 5 Dec 2010 20:14:12 -0500 (EST) From: "Stu Hood (JIRA)" To: commits@cassandra.apache.org Subject: [jira] Commented: (CASSANDRA-1555) Considerations for larger bloom filters MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12967060#action_12967060 ] Stu Hood commented on CASSANDRA-1555: ------------------------------------- * If BloomFilter is likely to be deprecated in favor of BigBloomFilter, naming them LegacyBloomFilter and BloomFilter might reduce the surface area of future changes * Probably a good opportunity to improve the serialization of BigBloomFilter: Java serialization is very wasteful for space (each row would contain the string "org.apache.cassandra.utils.obs.OpenBitSet"). Instead, just serializing an OpenBitSet as a {{long[]}} and # of valid bits would be much better * (Big)BloomFilter * maxBucketsPerElement can be pushed up into Filter * getFilter could probably be pushed up to filter, or at least removed from BloomFilter * emptyBuckets is unused * Orphaned method BigBloomFilter.serializeBitSet * Indentation is off in SSTableReader and BigBloomFilter I'm working on a separate issue to refresh LegacySSTableTest to check the column-level bloom filters as well: see CASSANDRA-1822 > Considerations for larger bloom filters > --------------------------------------- > > Key: CASSANDRA-1555 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1555 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Stu Hood > Assignee: Ryan King > Fix For: 0.8 > > Attachments: cassandra-1555.tgz, CASSANDRA-1555v2.patch > > > To (optimally) support SSTables larger than 143 million keys, we need to support bloom filters larger than 2^31 bits, which java.util.BitSet can't handle directly. > A few options: > * Switch to a BitSet class which supports 2^31 * 64 bits (Lucene's OpenBitSet) > * Partition the java.util.BitSet behind our current BloomFilter > ** Straightforward bit partitioning: bit N is in bitset N // 2^31 > ** Separate equally sized complete bloom filters for member ranges, which can be used independently or OR'd together under memory pressure. > All of these options require new approaches to serialization. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.