Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3C673C7E2 for ; Sun, 3 Jun 2012 15:01:28 +0000 (UTC) Received: (qmail 77452 invoked by uid 500); 3 Jun 2012 15:01:28 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 77016 invoked by uid 500); 3 Jun 2012 15:01:27 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 76981 invoked by uid 99); 3 Jun 2012 15:01:27 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 03 Jun 2012 15:01:27 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 0FDFE140BEF for ; Sun, 3 Jun 2012 15:01:27 +0000 (UTC) Date: Sun, 3 Jun 2012 15:01:27 +0000 (UTC) From: "Daniel Doubleday (JIRA)" To: commits@cassandra.apache.org Message-ID: <606326668.31790.1338735687067.JavaMail.jiratomcat@issues-vm> In-Reply-To: <153176138.24486.1338507564374.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (CASSANDRA-4303) Compressed bloomfilters MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288173#comment-13288173 ] Daniel Doubleday commented on CASSANDRA-4303: --------------------------------------------- I totally understand / agree with what's being said: You don't want any portion of the bf not im memory for a cf that's under any significant read load. My point was that for any significant read load and reasonable amount of memory the bloom filter will be in memory due to its random access nature. I did spend some time on page cache related tests and found it pretty hard to out smart. Its generational design doesn't just page out stuff because you are running through some large files once. So my theory was that if a bloom filter is hot (and it's hot pretty fast) it will stay in memory or you are so under-equipped with RAM that it doesn't matter. But I guess you are right that it doesn't really help the underlying issue that bloom filters get too large for a large amount of rows. They need to be in memory one way or the other ... It might be useful to be able to reduce the bf size dynamically though. So instead of reducing FP and rewriting the filters on disc you could leave it at minimum and just do one more mod operation to map bit pos to buckets while deserializing. > Compressed bloomfilters > ----------------------- > > Key: CASSANDRA-4303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4303 > Project: Cassandra > Issue Type: Improvement > Reporter: Brandon Williams > Fix For: 1.2 > > > Very commonly, people encountering an OOM need to increase their bloom filter false positive ratio to reduce memory pressure, since BFs tend to be the largest shareholder. It would make sense if we could alleviate the memory pressure from BFs with compression while maintaining the FP ratio (at the cost of a bit of cpu) that some users have come to expect. One possible implementation is at http://code.google.com/p/javaewah/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira