couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Norman Barker <norman.bar...@gmail.com>
Subject Re: multiview using bloom filters
Date Sun, 26 Sep 2010 16:05:05 GMT
I have added the formatting changes and contacted the author of
scalable bloom filters, it seems that bitarray (and the hipe version)
came from a discussion on the erlang mailing lists

http://groups.google.com/group/erlang-programming/browse_thread/thread/7c0191b1d709a5fe/ea5cf52b46d67d76?lnk=gst&q=bitarray#ea5cf52b46d67d76

but the author of the bloom.erl should be able to confirm.

Any other comments, anyone had a chance to test it out?!!

thanks,

Norman


On Sat, Sep 25, 2010 at 8:45 AM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
> Although, I don't believe theirs is growable. But if it is, that might
> be interesting to test for speed. Or we could add the growable parts.
>
> On Sat, Sep 25, 2010 at 5:44 AM, Robert Dionne
> <dionne@dionne-associates.com> wrote:
>> Norman,
>>
>>   Basho also has a bloom filter implementation packaged as a separate project[1],
that you might find useful. It's used in Bitcask.
>>
>> Cheers,
>>
>> Bob
>>
>>
>>
>> [1] http://github.com/basho/ebloom
>>
>>
>>
>>
>> On Sep 24, 2010, at 11:21 PM, Norman Barker wrote:
>>
>>> Paul,
>>>
>>> yes, performance is actually much better (for some of our harder
>>> queries, so all docs over time with field X (two views), 10x faster),
>>> I am testing with docs that in total emit ~100K of keys (following the
>>> raindrop megaview).
>>>
>>> Some of the scalable bloom filter project contained EPL headers,
>>> others didn't, googling for the source code I had seen other projects
>>> add the EPL headers to bit array so I did the same. I will contact the
>>> author as he seems active on the erlang mailing lists and if not I
>>> will write a bloom filter from scratch, the theory is well documented,
>>> though I like his code!
>>>
>>> thanks for your help, let me know any suggestions you may have.
>>>
>>> thanks,
>>>
>>> Norman
>>>
>>>
>>>
>>> On Fri, Sep 24, 2010 at 7:16 PM, Paul Davis <paul.joseph.davis@gmail.com>
wrote:
>>>> Norman,
>>>>
>>>> Just glanced through. Looks better. Any feeling for a performance differences?
>>>>
>>>> Also, I glanced at the original files that you linked to. The bit
>>>> array files didn't have a license, but what you've got there does have
>>>> EPL headers. We need to make sure we have permission to do so. I would
>>>> assume as much, but we have to be careful about such things in the
>>>> ASF. You only need to get an email from the original author saying its
>>>> ok.
>>>>
>>>> I'm a bit caught up with some other code at the moment, I'll give a
>>>> more thorough combing over tomorrow.
>>>>
>>>> Paul
>>>>
>>>> On Fri, Sep 24, 2010 at 7:54 PM, Norman Barker <norman.barker@gmail.com>
wrote:
>>>>> Hi,
>>>>>
>>>>> thanks to Paul's excellent suggestion I have rewritten the multiview
>>>>> to use bloom filters, I had a concern that a bloom filter per view
>>>>> would use too much memory but thanks in the main to excellent
>>>>> implementation of bloom filters in erlang
>>>>> (http://sites.google.com/site/scalablebloomfilters/) they seem to be
>>>>> very space efficient.
>>>>>
>>>>> New code is here
>>>>>
>>>>> http://github.com/normanb/couchdb/
>>>>>
>>>>> The code is simple, all one process, once we have agreed the approach
>>>>> we can decide if there is any benefit in making the bloom filter
>>>>> generation occur a separate process (using a genserver).
>>>>>
>>>>> Comments as always appreciated, I will continue adding to the test suite.
>>>>>
>>>>> thanks for the help,
>>>>>
>>>>> Norman
>>>>>
>>>>
>>
>>
>

Mime
View raw message