Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 19639 invoked from network); 7 Dec 2010 12:46:34 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 7 Dec 2010 12:46:34 -0000 Received: (qmail 18060 invoked by uid 500); 7 Dec 2010 12:46:34 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 17911 invoked by uid 500); 7 Dec 2010 12:46:34 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 17903 invoked by uid 99); 7 Dec 2010 12:46:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Dec 2010 12:46:33 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Dec 2010 12:46:33 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oB7CkCwk023271 for ; Tue, 7 Dec 2010 12:46:12 GMT Message-ID: <10134988.22801291725972546.JavaMail.jira@thor> Date: Tue, 7 Dec 2010 07:46:12 -0500 (EST) From: "T Jake Luciani (JIRA)" To: commits@cassandra.apache.org Subject: [jira] Updated: (CASSANDRA-1555) Considerations for larger bloom filters MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani updated CASSANDRA-1555: -------------------------------------- Attachment: 1555_v5.txt Fixed the murur hash problem (the issue was with use of bytebuffers) Re-factored the code a bit. Put hash32 and hash64 into MurmurHash class. Overall I'm happy with this implementation, especially the sstable descriptor approach. +1 Stu, I wasn't able to apply your latest patch for tests could you rebase against v5? > Considerations for larger bloom filters > --------------------------------------- > > Key: CASSANDRA-1555 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1555 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Stu Hood > Assignee: Ryan King > Fix For: 0.8 > > Attachments: 1555_v5.txt, addendum-to-1555.txt, cassandra-1555.tgz, CASSANDRA-1555v2.patch, CASSANDRA-1555v3.patch.gz, CASSANDRA-1555v4.patch.gz > > > To (optimally) support SSTables larger than 143 million keys, we need to support bloom filters larger than 2^31 bits, which java.util.BitSet can't handle directly. > A few options: > * Switch to a BitSet class which supports 2^31 * 64 bits (Lucene's OpenBitSet) > * Partition the java.util.BitSet behind our current BloomFilter > ** Straightforward bit partitioning: bit N is in bitset N // 2^31 > ** Separate equally sized complete bloom filters for member ranges, which can be used independently or OR'd together under memory pressure. > All of these options require new approaches to serialization. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.