hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: [jira] Assigned: (HBASE-1200) Add bloomfilters to hfile; use dynamicbloomfilter instead of base bloomfilter; depend on hadoop 0.20
Date Thu, 05 Mar 2009 09:41:07 GMT
I'm going to give this a shot tomorrow.

Plan is to use row:cf:colqual as the bloom-filter 'key'.  That way we can
test if any specific row/col is in any specific file.  I might also add
'row' only as another bloom filter to test.

Note that in general this would only be useful once we know that a specific
row/column exists and want to optimize how many files we have to seek/read.

-ryan

On Thu, Mar 5, 2009 at 1:27 AM, ryan rawson (JIRA) <jira@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/HBASE-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> ryan rawson reassigned HBASE-1200:
> ----------------------------------
>
>    Assignee: ryan rawson  (was: stack)
>
> > Add bloomfilters to hfile; use dynamicbloomfilter instead of base
> bloomfilter; depend on hadoop 0.20
> >
> ----------------------------------------------------------------------------------------------------
> >
> >                 Key: HBASE-1200
> >                 URL: https://issues.apache.org/jira/browse/HBASE-1200
> >             Project: Hadoop HBase
> >          Issue Type: Task
> >            Reporter: stack
> >            Assignee: ryan rawson
> >             Fix For: 0.20.0
> >
> >
> > Add bloomfiltering to hfile.  Should it be optional or on always?
>  Currently, we bloom filter rows only, not the column + ts component, which
> seems good place to start but we size the bloomfilter with the number of
> entries we are about to flush which seems like usually we'd be making a
> filter too big.  How to figure how many rows in the flush?   We should use
> the DynamicBloomFilter as Andrezj does up in hadoop BloomFilterMapFile.
>  Start small and let it resize as entries are added.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message