From dev-return-3901-apmail-couchdb-dev-archive=couchdb.apache.org@couchdb.apache.org Sat Apr 11 15:07:04 2009 Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 36352 invoked from network); 11 Apr 2009 15:07:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Apr 2009 15:07:04 -0000 Received: (qmail 91615 invoked by uid 500); 11 Apr 2009 15:07:03 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 91507 invoked by uid 500); 11 Apr 2009 15:07:03 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 91497 invoked by uid 99); 11 Apr 2009 15:07:03 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Apr 2009 15:07:03 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [83.97.50.139] (HELO jan.prima.de) (83.97.50.139) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Apr 2009 15:06:55 +0000 Received: from [192.168.10.5] (bl7-71-49.dsl.telepac.pt [::ffff:85.240.71.49]) (AUTH: LOGIN jan, TLS: TLSv1/SSLv3,128bits,AES128-SHA) by jan.prima.de with esmtp; Sat, 11 Apr 2009 15:06:33 +0000 Message-Id: <6788FFD7-6897-4091-8404-7BCFCE8EDDBF@apache.org> From: Jan Lehnardt To: dev@couchdb.apache.org In-Reply-To: <7db9abd30904102031y5434f379x73d611912617b0cd@mail.gmail.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v930.3) Subject: Re: View Intersections Date: Sat, 11 Apr 2009 16:06:01 +0100 References: <8DA05E3E-B8FF-4D42-9D30-B681E18C4B4C@gmail.com> <011A0D62-06C2-490B-A4C4-7EEF0203B6C3@gmail.com> <7db9abd30904101751v1cc430edh3c1e2c9231f40943@mail.gmail.com> <7db9abd30904102031y5434f379x73d611912617b0cd@mail.gmail.com> X-Mailer: Apple Mail (2.930.3) X-Virus-Checked: Checked by ClamAV on apache.org On 11 Apr 2009, at 04:31, kowsik wrote: > Parallel == multiple-threads across multiple-machines in the > cluster? :-) > > By definition, temp views don't have no disk IO. They must get data to process from somewhere :) > It's map/reduce > parallelized in memory directly served back over a TCP socket. Is that > still not going to be fast enough? A common fallacy with CouchDB's Map/Reduce is thinking that doing things on multiple nodes is magically faster. The sweet-spot for Map/Reduce is heavy computation on small bits of distributed data. CouchDB's views are the opposite: Little computation of huge amounts of data. Unless your data is already distributed across participating nodes, distributed M/R is not going to make anything faster. With upcoming clustering, you get partial data distribution and parallel execution, but that doesn't mean that anything has to change in the current view server code. (It has other areas that are open for speed improvements). Cheers Jan -- > > K. > > On Fri, Apr 10, 2009 at 7:26 PM, Paul Davis > wrote: >> On Fri, Apr 10, 2009 at 8:51 PM, kowsik wrote: >>> IMHO, the need for view intersections will go away once we have >>> parallel map/reduce to the point where _temp_views's are fast! >>> >>> K. >>> >> >> The lower bound for view generation is disk I/O. Temp views will >> never >> be fast enough for production. >> >> HTH, >> Paul Davis >> >>> On Fri, Apr 10, 2009 at 10:04 AM, Wout Mertens >> > wrote: >>>> >>>> On Apr 10, 2009, at 11:46 AM, Sho Fukamachi wrote: >>>> >>>>> the obvious followup question to those examples is "well, how do >>>>> I find a >>>>> document with all of (n) tags?". >>>> >>>> How about this algorithm. Needed: tagcount view and document-by- >>>> tag view >>>> >>>> - given a list of tags that the document should have >>>> - find the tag that has the lowest document count with the >>>> tagcount view >>>> - request all documents with that tag through the document-by-tag >>>> view >>>> - filter manually on documents that match >>>> >>>> If that would mean too many documents, make a view that emits all >>>> combinations of 2 tags a document has, that way you filter by >>>> that much >>>> more. >>>> >>>> It would be neat if one could post a temporary view that runs >>>> against a >>>> subset of the output of a real view. That way the viewserver farm >>>> could do >>>> the filtering... >>>> >>>> Wout. >>>> >>> >> >