Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 44784 invoked from network); 17 Feb 2009 19:11:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 17 Feb 2009 19:11:24 -0000 Received: (qmail 52672 invoked by uid 500); 17 Feb 2009 19:11:23 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 52652 invoked by uid 500); 17 Feb 2009 19:11:23 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 52641 invoked by uid 99); 17 Feb 2009 19:11:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Feb 2009 11:11:23 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Feb 2009 19:11:21 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 23A8E234C48C for ; Tue, 17 Feb 2009 11:11:00 -0800 (PST) Message-ID: <1725133295.1234897860144.JavaMail.jira@brutus> Date: Tue, 17 Feb 2009 11:11:00 -0800 (PST) From: "Erik Holstad (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Commented: (HBASE-1200) Add bloomfilters to hfile; use dynamicbloomfilter instead of base bloomfilter; depend on hadoop 0.20 In-Reply-To: <1526820235.1234550339857.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674308#action_12674308 ] Erik Holstad commented on HBASE-1200: ------------------------------------- I think that the user should have an option to not use bloom filters, even though I can't really see why you wouldn't, but still have an option to do so. I also think that we should try to go towards row+column like BT. Using the Dynamic bloom filter seems like a reasonable way to go, the only thing I can see is that we are still going to have an overhead, even though it is smaller than now. So if possible wait until we know the exact number and then create the filter. Not sure what the time loss will be for the flush doing it this way, but that could be tested. > Add bloomfilters to hfile; use dynamicbloomfilter instead of base bloomfilter; depend on hadoop 0.20 > ---------------------------------------------------------------------------------------------------- > > Key: HBASE-1200 > URL: https://issues.apache.org/jira/browse/HBASE-1200 > Project: Hadoop HBase > Issue Type: Task > Reporter: stack > Assignee: stack > Fix For: 0.20.0 > > > Add bloomfiltering to hfile. Should it be optional or on always? Currently, we bloom filter rows only, not the column + ts component, which seems good place to start but we size the bloomfilter with the number of entries we are about to flush which seems like usually we'd be making a filter too big. How to figure how many rows in the flush? We should use the DynamicBloomFilter as Andrezj does up in hadoop BloomFilterMapFile. Start small and let it resize as entries are added. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.