couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: View Intersections
Date Sat, 11 Apr 2009 15:06:01 GMT

On 11 Apr 2009, at 04:31, kowsik wrote:

> Parallel == multiple-threads across multiple-machines in the  
> cluster? :-)
>
> By definition, temp views don't have no disk IO.

They must get data to process from somewhere :)


> It's map/reduce
> parallelized in memory directly served back over a TCP socket. Is that
> still not going to be fast enough?

A common fallacy with CouchDB's Map/Reduce is thinking that doing things
on multiple nodes is magically faster.

The sweet-spot for Map/Reduce is heavy computation on small bits of
distributed data. CouchDB's views are the opposite: Little computation
of huge amounts of data. Unless your data is already distributed across
participating nodes, distributed M/R is not going to make anything  
faster.

With upcoming clustering, you get partial data distribution and parallel
execution, but that doesn't mean that anything has to change in the
current view server code. (It has other areas that are open for speed
improvements).

Cheers
Jan
--

>
> K.
>
> On Fri, Apr 10, 2009 at 7:26 PM, Paul Davis <paul.joseph.davis@gmail.com 
> > wrote:
>> On Fri, Apr 10, 2009 at 8:51 PM, kowsik <kowsik@gmail.com> wrote:
>>> IMHO, the need for view intersections will go away once we have
>>> parallel map/reduce to the point where _temp_views's are fast!
>>>
>>> K.
>>>
>>
>> The lower bound for view generation is disk I/O. Temp views will  
>> never
>> be fast enough for production.
>>
>> HTH,
>> Paul Davis
>>
>>> On Fri, Apr 10, 2009 at 10:04 AM, Wout Mertens <wout.mertens@gmail.com 
>>> > wrote:
>>>>
>>>> On Apr 10, 2009, at 11:46 AM, Sho Fukamachi wrote:
>>>>
>>>>> the obvious followup question to those examples is "well, how do  
>>>>> I find a
>>>>> document with all of (n) tags?".
>>>>
>>>> How about this algorithm. Needed: tagcount view and document-by- 
>>>> tag view
>>>>
>>>> - given a list of tags that the document should have
>>>> - find the tag that has the lowest document count with the  
>>>> tagcount view
>>>> - request all documents with that tag through the document-by-tag  
>>>> view
>>>> - filter manually on documents that match
>>>>
>>>> If that would mean too many documents, make a view that emits all
>>>> combinations of 2 tags a document has, that way you filter by  
>>>> that much
>>>> more.
>>>>
>>>> It would be neat if one could post a temporary view that runs  
>>>> against a
>>>> subset of the output of a real view. That way the viewserver farm  
>>>> could do
>>>> the filtering...
>>>>
>>>> Wout.
>>>>
>>>
>>
>


Mime
View raw message