couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Kocoloski (JIRA)" <j...@apache.org>
Subject [jira] Updated: (COUCHDB-160) replication performance improvements
Date Thu, 13 Nov 2008 18:44:44 GMT

     [ https://issues.apache.org/jira/browse/COUCHDB-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adam Kocoloski updated COUCHDB-160:
-----------------------------------

    Attachment: couch_rep_v2.diff

Here's an updated patch that uses persistent connections and pipelining to further accelerate
replications where the source is remote.  Updated benchmarks indicate a 3x improvement in
performance for remote-local relative to my first patch, or a total of 10x faster replications
than trunk:

parallel+pipeline:
local-remote    31
remote-remote   36
remote-local    13

Note the asymmetry for local-remote vs. remote-local.  Replications to remote targets are
still negotiating a new TCP connection for every POST.  Now, we're not allowed to pipeline
POSTs, but there's nothing wrong with using persistent connections.  Last I heard, Erlang's
HTTP client needs to be updated to deal with that particular use case:

http://www.erlang.org/pipermail/erlang-questions/2008-August/037113.html

Best, Adam

> replication performance improvements
> ------------------------------------
>
>                 Key: COUCHDB-160
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-160
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>    Affects Versions: 0.9
>            Reporter: Adam Kocoloski
>            Priority: Minor
>         Attachments: couch_rep.erl.diff, couch_rep_v2.diff
>
>
> I wrote some code to speed up CouchDB's replication process by parallelizing document
requests and using _bulk_docs to write changes to the target.  I tested the speedup as follows:
> * 1000 document DB, 1022 update_seq, ~450 KB after compaction
> * local and remote machines have ~45 ms latency
> * timed requests using timer:tc(couch_rep, replicate, [<<"source">>, <<"target">>]
> * all replications are "from scratch"
> trunk:
> local-local     115
> local-remote    145
> remote-remote   173
> remote-local    146
> db size after replication: 1.8 MB
> patch:
> local-local     1.83
> local-remote    38
> remote-remote   64
> remote-local    35
> db size after replication: 453 KB
> I'll attach the patch as an update to this issue.  It might be worth exposing the "batch
size" (currently 100 docs) as a configurable parameter.  Comments welcome.  Best, 
> Adam

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message