couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: view building batch sizes
Date Thu, 09 Dec 2010 16:31:25 GMT
On Dec 9, 2010, at 10:49 AM, Paul Davis wrote:

> On Thu, Dec 9, 2010 at 10:47 AM, Jan Lehnardt <jan@apache.org> wrote:
>> 
>> On 9 Dec 2010, at 15:37, Paul Davis wrote:
>> 
>>> On Thu, Dec 9, 2010 at 7:51 AM, Jan Lehnardt <jan@apache.org> wrote:
>>>> Hi Huw,
>>>> 
>>>> 
>>>> On 9 Dec 2010, at 13:32, Huw Selley wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I read on http://guide.couchdb.org/draft/performance.html that
>>>>> 
>>>>> "Views load a batch of updates from disk, pass them through the view
engine, and then write the view rows out. Each batch is a few hundred documents, so the writer
can take advantage of the bulk efficiencies we see in the next section."
>>>>> 
>>>>> Is there a method to change the batch size? I would like to try measure
the impact of using smaller and larger batches.
>>>> 
>>>> Thanks for helping to profile things. You may want to take this to
>>>> dev@couchdb.apache.org as it is the development-related mailing list.
>>>> 
>>>> For tuning these values, see src/couchdb/couch_view_updater.erl
>>>> 
>>>> The `update()` function has these lines:
>>>> 
>>>>    {ok, MapQueue} = couch_work_queue:new(100000, 500),
>>>>    {ok, WriteQueue} = couch_work_queue:new(100000, 500),
>>>> 
>>>> They set up a queue for mapping and writing each. The parameters are
>>>> 
>>>>    couch_work_queue:new(MaxSize, MaxItems)
>>>> 
>>>> If either maximum is hit, the queue is deemed full.
>>>> 
>>>> Note: This is from about 30 seconds of looking at the source, so I
>>>> might miss a subtlety or three.
>>>> 
>>>> Cheers
>>>> Jan
>>>> --
>>>> 
>>>> 
>>>> 
>>> 
>>> The only real subtlety is that we don't wait for a minimum amount to
>>> be inserted into the queue. Playing with larger or smaller queues on
>>> either side might be an interesting bit. Also, for testing it might
>>> not be a bad idea to add config values for these values.
>> 
>> 
>> Good thinking, I made a patch:
>> 
>>  https://github.com/janl/couchdb/commit/547691a9f4b9895086f2763af84e1cc459e4d72c
>> 
>> Branch:
>> 
>>  https://github.com/janl/couchdb/tree/config-view-batches
>> 
>> "Compiles for me".
>> 
>> To make this proper, we probably want to move the lookups into
>> couch_view_group:init/3 and pass the values down, but it should
>> be ok as is.
>> 
> 
> Probably not worth it. ets looks like that are quick, and as a
> percentage of a view build are going to be fairly inconsequential.
> Though, you could make a case about coupling.

This issue was initially raised in https://issues.apache.org/jira/browse/COUCHDB-700 .  I'm
not opposed to making the queue sizes configurable, but I think the more important fix by
far is to be able to configure a minimum number of items in the work unit sent to the reducer.
 Cheers,

Adam
Mime
View raw message