Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 14741 invoked from network); 26 Sep 2010 16:05:32 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Sep 2010 16:05:32 -0000 Received: (qmail 19612 invoked by uid 500); 26 Sep 2010 16:05:31 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 19494 invoked by uid 500); 26 Sep 2010 16:05:31 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 19486 invoked by uid 99); 26 Sep 2010 16:05:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 26 Sep 2010 16:05:31 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of norman.barker@gmail.com designates 74.125.82.180 as permitted sender) Received: from [74.125.82.180] (HELO mail-wy0-f180.google.com) (74.125.82.180) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 26 Sep 2010 16:05:26 +0000 Received: by wyj26 with SMTP id 26so4936528wyj.11 for ; Sun, 26 Sep 2010 09:05:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=uVybGPNH/mlOIlxTlsQyWT7yWxz1MGw7wd9u4Qpypcg=; b=GxwU6uPsMUTc1e087+PfIxYCYmpGPvStVEghcOicqCrXABIAC4z99QeaRs33x2QaOC 4YOK4k4fI5DqCBeaX3LbDWHkKkbEF+LamOL2xvNBzU1XXJtRVR7HTAV1qGk6hAwuuph4 PjpVALp6i5fq+5okAJGjbiZLInM/ok5I+zWVc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=MGS7DefDB5oJi4RibjeOcqaL/sX38NqACVwGLezJkw/i2Z6BxCvI5cs+x/WQwSUxsU a8zSOPwgIdHq6q+JQ6NvqdHLG5POI1TbbqLQTP/JPEYohdbsHSSd0OT6YTulNG3sHPrB r13plSQoOek3mgdBN2YhTV5wPuou/7w051Kic= MIME-Version: 1.0 Received: by 10.227.127.130 with SMTP id g2mr5328647wbs.67.1285517105140; Sun, 26 Sep 2010 09:05:05 -0700 (PDT) Received: by 10.216.48.197 with HTTP; Sun, 26 Sep 2010 09:05:05 -0700 (PDT) In-Reply-To: References: <9BAD16AA-1F30-4584-BEFC-0296BA203286@dionne-associates.com> Date: Sun, 26 Sep 2010 10:05:05 -0600 Message-ID: Subject: Re: multiview using bloom filters From: Norman Barker To: dev@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I have added the formatting changes and contacted the author of scalable bloom filters, it seems that bitarray (and the hipe version) came from a discussion on the erlang mailing lists http://groups.google.com/group/erlang-programming/browse_thread/thread/7c01= 91b1d709a5fe/ea5cf52b46d67d76?lnk=3Dgst&q=3Dbitarray#ea5cf52b46d67d76 but the author of the bloom.erl should be able to confirm. Any other comments, anyone had a chance to test it out?!! thanks, Norman On Sat, Sep 25, 2010 at 8:45 AM, Paul Davis w= rote: > Although, I don't believe theirs is growable. But if it is, that might > be interesting to test for speed. Or we could add the growable parts. > > On Sat, Sep 25, 2010 at 5:44 AM, Robert Dionne > wrote: >> Norman, >> >> =A0 Basho also has a bloom filter implementation packaged as a separate = project[1], that you might find useful. It's used in Bitcask. >> >> Cheers, >> >> Bob >> >> >> >> [1] http://github.com/basho/ebloom >> >> >> >> >> On Sep 24, 2010, at 11:21 PM, Norman Barker wrote: >> >>> Paul, >>> >>> yes, performance is actually much better (for some of our harder >>> queries, so all docs over time with field X (two views), 10x faster), >>> I am testing with docs that in total emit ~100K of keys (following the >>> raindrop megaview). >>> >>> Some of the scalable bloom filter project contained EPL headers, >>> others didn't, googling for the source code I had seen other projects >>> add the EPL headers to bit array so I did the same. I will contact the >>> author as he seems active on the erlang mailing lists and if not I >>> will write a bloom filter from scratch, the theory is well documented, >>> though I like his code! >>> >>> thanks for your help, let me know any suggestions you may have. >>> >>> thanks, >>> >>> Norman >>> >>> >>> >>> On Fri, Sep 24, 2010 at 7:16 PM, Paul Davis wrote: >>>> Norman, >>>> >>>> Just glanced through. Looks better. Any feeling for a performance diff= erences? >>>> >>>> Also, I glanced at the original files that you linked to. The bit >>>> array files didn't have a license, but what you've got there does have >>>> EPL headers. We need to make sure we have permission to do so. I would >>>> assume as much, but we have to be careful about such things in the >>>> ASF. You only need to get an email from the original author saying its >>>> ok. >>>> >>>> I'm a bit caught up with some other code at the moment, I'll give a >>>> more thorough combing over tomorrow. >>>> >>>> Paul >>>> >>>> On Fri, Sep 24, 2010 at 7:54 PM, Norman Barker wrote: >>>>> Hi, >>>>> >>>>> thanks to Paul's excellent suggestion I have rewritten the multiview >>>>> to use bloom filters, I had a concern that a bloom filter per view >>>>> would use too much memory but thanks in the main to excellent >>>>> implementation of bloom filters in erlang >>>>> (http://sites.google.com/site/scalablebloomfilters/) they seem to be >>>>> very space efficient. >>>>> >>>>> New code is here >>>>> >>>>> http://github.com/normanb/couchdb/ >>>>> >>>>> The code is simple, all one process, once we have agreed the approach >>>>> we can decide if there is any benefit in making the bloom filter >>>>> generation occur a separate process (using a genserver). >>>>> >>>>> Comments as always appreciated, I will continue adding to the test su= ite. >>>>> >>>>> thanks for the help, >>>>> >>>>> Norman >>>>> >>>> >> >> >