couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Cottlehuber <...@jsonified.com>
Subject Re: [jira] [Commented] (COUCHDB-1334) Indexer speedup (for non-native view servers)
Date Wed, 12 Dec 2012 06:50:14 GMT
Thanks Filipe!

Yes I will definitely have a crack at this. I recall overlapped IO &
this makes sense. Ta.

Dave

On 11 December 2012 22:37, Filipe Manana (JIRA) <jira@apache.org> wrote:
>
>     [ https://issues.apache.org/jira/browse/COUCHDB-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529361#comment-13529361
]
>
> Filipe Manana commented on COUCHDB-1334:
> ----------------------------------------
>
> Dave,
>
> I think we need to pass the option 'overlapped_io' to open_port call in couch_os_process.
> This is to ensure parallel reads and writes to the underlying pipe on Windows are not
blocking [1], allowing erlang C module to do async IO [2] with the pipe.
> Are you able to try this?
>
> Reducing the buffer size to something very small such as 8 bytes, doesn't block on Linux
nor OS X (both small, such as 20 bytes docs, and very large docs such as 100Kb docs).
> So this seems like a pure Windows specific issue.
>
> [1] http://msdn.microsoft.com/en-us/library/windows/desktop/aa365150(v=vs.85).aspx
> [2] https://github.com/erlang/otp/blob/maint/erts/emulator/sys/win32/sys.c#L2246
>
>> Indexer speedup (for non-native view servers)
>> ---------------------------------------------
>>
>>                 Key: COUCHDB-1334
>>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1334
>>             Project: CouchDB
>>          Issue Type: Improvement
>>          Components: Database Core, JavaScript View Server, View Server Support
>>            Reporter: Filipe Manana
>>            Assignee: Filipe Manana
>>             Fix For: 1.3
>>
>>         Attachments: 0001-More-efficient-view-updater-writes.patch, 0002-More-efficient-communication-with-the-view-server.patch,
master-0002-More-efficient-communication-with-the-view-server.patch, master-2-0002-More-efficient-communication-with-the-view-server.patch,
master-3-0002-More-efficient-communication-with-the-view-server.patch, master-4-0002-More-efficient-communication-with-the-view-server.patch
>>
>>
>> The following 2 patches significantly improve view index generation/update time and
reduce CPU consumption.
>> The first patch makes the view updater's batching more efficient, by ensuring each
btree bulk insertion adds/removes a minimum of N (=100) key/value pairts. This also makes
the index file size grow not so fast with old data (old btree nodes basically). This behaviour
is already done in master/trunk in the new indexer (by Paul Davis).
>> The second patch maximizes the throughput with an external view server (such as couchjs).
Basically it makes the pipe (erlang port) communication between the Erlang VM (couch_os_process
basically) and the view server more efficient since the 2 sides spend less time block on reading
from the pipe.
>> Here follow some benchmarks.
>> test database at  http://fdmanana.iriscouch.com/test_db  (1 million documents)
>> branch 1.2.x
>> $ echo 3 > /proc/sys/vm/drop_caches
>> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
>> {"rows":[
>> {"key":null,"value":1000000}
>> ]}
>> real  2m45.097s
>> user  0m0.006s
>> sys   0m0.007s
>> view file size: 333Mb
>> CPU usage:
>> $ sar 1 60
>> 22:27:20  %usr  %nice   %sys   %idle
>> 22:27:21   38      0     12     50
>> (....)
>> 22:28:21   39      0     13     49
>> Average:     39      0     13     47
>> branch 1.2.x + batch patch (first patch)
>> $ echo 3 > /proc/sys/vm/drop_caches
>> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
>> {"rows":[
>> {"key":null,"value":1000000}
>> ]}
>> real  2m12.736s
>> user  0m0.006s
>> sys   0m0.005s
>> view file size 72Mb
>> branch 1.2.x + batch patch + os_process patch
>> $ echo 3 > /proc/sys/vm/drop_caches
>> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
>> {"rows":[
>> {"key":null,"value":1000000}
>> ]}
>> real  1m9.330s
>> user  0m0.006s
>> sys   0m0.004s
>> view file size:  72Mb
>> CPU usage:
>> $ sar 1 60
>> 22:22:55  %usr  %nice   %sys   %idle
>> 22:23:53   22      0      6     72
>> (....)
>> 22:23:55   22      0      6     72
>> Average:     22      0      7     70
>> master/trunk
>> $ echo 3 > /proc/sys/vm/drop_caches
>> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
>> {"rows":[
>> {"key":null,"value":1000000}
>> ]}
>> real  1m57.296s
>> user  0m0.006s
>> sys   0m0.005s
>> master/trunk + os_process patch
>> $ echo 3 > /proc/sys/vm/drop_caches
>> $ time curl http://localhost:5984/test_db/_design/test/_view/test1
>> {"rows":[
>> {"key":null,"value":1000000}
>> ]}
>> real  0m53.768s
>> user  0m0.006s
>> sys   0m0.006s
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message