couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: multiview using bloom filters
Date Sat, 25 Sep 2010 14:45:59 GMT
Although, I don't believe theirs is growable. But if it is, that might
be interesting to test for speed. Or we could add the growable parts.

On Sat, Sep 25, 2010 at 5:44 AM, Robert Dionne
<dionne@dionne-associates.com> wrote:
> Norman,
>
>   Basho also has a bloom filter implementation packaged as a separate project[1], that
you might find useful. It's used in Bitcask.
>
> Cheers,
>
> Bob
>
>
>
> [1] http://github.com/basho/ebloom
>
>
>
>
> On Sep 24, 2010, at 11:21 PM, Norman Barker wrote:
>
>> Paul,
>>
>> yes, performance is actually much better (for some of our harder
>> queries, so all docs over time with field X (two views), 10x faster),
>> I am testing with docs that in total emit ~100K of keys (following the
>> raindrop megaview).
>>
>> Some of the scalable bloom filter project contained EPL headers,
>> others didn't, googling for the source code I had seen other projects
>> add the EPL headers to bit array so I did the same. I will contact the
>> author as he seems active on the erlang mailing lists and if not I
>> will write a bloom filter from scratch, the theory is well documented,
>> though I like his code!
>>
>> thanks for your help, let me know any suggestions you may have.
>>
>> thanks,
>>
>> Norman
>>
>>
>>
>> On Fri, Sep 24, 2010 at 7:16 PM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
>>> Norman,
>>>
>>> Just glanced through. Looks better. Any feeling for a performance differences?
>>>
>>> Also, I glanced at the original files that you linked to. The bit
>>> array files didn't have a license, but what you've got there does have
>>> EPL headers. We need to make sure we have permission to do so. I would
>>> assume as much, but we have to be careful about such things in the
>>> ASF. You only need to get an email from the original author saying its
>>> ok.
>>>
>>> I'm a bit caught up with some other code at the moment, I'll give a
>>> more thorough combing over tomorrow.
>>>
>>> Paul
>>>
>>> On Fri, Sep 24, 2010 at 7:54 PM, Norman Barker <norman.barker@gmail.com>
wrote:
>>>> Hi,
>>>>
>>>> thanks to Paul's excellent suggestion I have rewritten the multiview
>>>> to use bloom filters, I had a concern that a bloom filter per view
>>>> would use too much memory but thanks in the main to excellent
>>>> implementation of bloom filters in erlang
>>>> (http://sites.google.com/site/scalablebloomfilters/) they seem to be
>>>> very space efficient.
>>>>
>>>> New code is here
>>>>
>>>> http://github.com/normanb/couchdb/
>>>>
>>>> The code is simple, all one process, once we have agreed the approach
>>>> we can decide if there is any benefit in making the bloom filter
>>>> generation occur a separate process (using a genserver).
>>>>
>>>> Comments as always appreciated, I will continue adding to the test suite.
>>>>
>>>> thanks for the help,
>>>>
>>>> Norman
>>>>
>>>
>
>

Mime
View raw message