hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1200) Add bloomfilters to hfile; use dynamicbloomfilter instead of base bloomfilter; depend on hadoop 0.20
Date Fri, 13 Feb 2009 18:41:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12673349#action_12673349
] 

stack commented on HBASE-1200:
------------------------------

Thing to do would be to run with them on for a while and then before release make a call.

Here is from BloomFilterMapFile:

{code}
    private synchronized void initBloomFilter(Configuration conf) {
      numKeys = conf.getInt("io.mapfile.bloom.size", 1024 * 1024);
      // vector size should be <code>-kn / (ln(1 - c^(1/k)))</code> bits for
      // single key, where <code> is the number of hash functions,
      // <code>n</code> is the number of keys and <code>c</code> is
the desired
      // max. error rate.
      // Our desired error rate is by default 0.005, i.e. 0.5%
      float errorRate = conf.getFloat("io.mapfile.bloom.error.rate", 0.005f);
      vectorSize = (int)Math.ceil((double)(-HASH_COUNT * numKeys) /
          Math.log(1.0 - Math.pow(errorRate, 1.0/HASH_COUNT)));
      bloomFilter = new DynamicBloomFilter(vectorSize, HASH_COUNT,
          Hash.getHashType(conf), numKeys);
    }
{code}

> Add bloomfilters to hfile; use dynamicbloomfilter instead of base bloomfilter; depend
on hadoop 0.20
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1200
>                 URL: https://issues.apache.org/jira/browse/HBASE-1200
>             Project: Hadoop HBase
>          Issue Type: Task
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.20.0
>
>
> Add bloomfiltering to hfile.  Should it be optional or on always?  Currently, we bloom
filter rows only, not the column + ts component, which seems good place to start but we size
the bloomfilter with the number of entries we are about to flush which seems like usually
we'd be making a filter too big.  How to figure how many rows in the flush?   We should use
the DynamicBloomFilter as Andrezj does up in hadoop BloomFilterMapFile.  Start small and let
it resize as entries are added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message