couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Norman Barker <norman.bar...@gmail.com>
Subject Re: multiview using bloom filters
Date Sat, 25 Sep 2010 03:21:08 GMT
Paul,

yes, performance is actually much better (for some of our harder
queries, so all docs over time with field X (two views), 10x faster),
I am testing with docs that in total emit ~100K of keys (following the
raindrop megaview).

Some of the scalable bloom filter project contained EPL headers,
others didn't, googling for the source code I had seen other projects
add the EPL headers to bit array so I did the same. I will contact the
author as he seems active on the erlang mailing lists and if not I
will write a bloom filter from scratch, the theory is well documented,
though I like his code!

thanks for your help, let me know any suggestions you may have.

thanks,

Norman



On Fri, Sep 24, 2010 at 7:16 PM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
> Norman,
>
> Just glanced through. Looks better. Any feeling for a performance differences?
>
> Also, I glanced at the original files that you linked to. The bit
> array files didn't have a license, but what you've got there does have
> EPL headers. We need to make sure we have permission to do so. I would
> assume as much, but we have to be careful about such things in the
> ASF. You only need to get an email from the original author saying its
> ok.
>
> I'm a bit caught up with some other code at the moment, I'll give a
> more thorough combing over tomorrow.
>
> Paul
>
> On Fri, Sep 24, 2010 at 7:54 PM, Norman Barker <norman.barker@gmail.com> wrote:
>> Hi,
>>
>> thanks to Paul's excellent suggestion I have rewritten the multiview
>> to use bloom filters, I had a concern that a bloom filter per view
>> would use too much memory but thanks in the main to excellent
>> implementation of bloom filters in erlang
>> (http://sites.google.com/site/scalablebloomfilters/) they seem to be
>> very space efficient.
>>
>> New code is here
>>
>> http://github.com/normanb/couchdb/
>>
>> The code is simple, all one process, once we have agreed the approach
>> we can decide if there is any benefit in making the bloom filter
>> generation occur a separate process (using a genserver).
>>
>> Comments as always appreciated, I will continue adding to the test suite.
>>
>> thanks for the help,
>>
>> Norman
>>
>

Mime
View raw message