couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Attila Nagy <...@fsn.hu>
Subject Why is replication so slow?
Date Fri, 16 Jul 2010 16:51:33 GMT
Hi,

I have three equal machines with Pentium(R) D CPU 3.20GHz, 2GiB RAM, 
FreeBSD 8, Erlang R13B04 (erts-5.7.5) [source] [64-bit] [smp:2:2] [rq:2] 
[async-threads:0] [hipe] [kernel-poll:false], and CouchDB 1.0.0.

I would like to replicate documents between the three (even more 
machines later) in a fully meshed replica agreement (every node 
replicates from/to every other to ensure that there is no SPoF and every 
document gets to others ASAP). The nodes would store small, but quickly 
changing documents (application no. 1) and larger (from several kBs to 
several GBs) binary attachments (application no. 2). The applications 
are not mixed on the same CouchDB instance (even the machines).

I've experimented with the first and noticed that no matter how fast 
insert documents (BTW, I could achieve about 230 inserts per second, 
parallel connections, no bulk inserts) the traffic between the machines 
doesn't go beyond about 500 kBps and the replicas lag behind the written 
node (a lot!).

Based on this, I've started another test, now with smaller binary 
attachments. The first run did this:
for i in `jot 128`
do
curl -X PUT http://localhost:5984/testdb/$i/file -H "Content-Type: 
application/octet-stream" --data-binary @bin1
done

That is, it uploads 128 MB of data (bin1 is 1MB of size).

Without replication, it runs in 8.64 seconds (14.81 MBps, not that fast 
either, but hey, it's erlang :). If I run it with background curl 
processes (maximum 128 parallel uploads), the script runs in 6.74s 
(18.99 MBps).

Now if I make a one way replica to another node (connected with gigabit 
ethernet), the run time slightly increases to 7.04s on the master node, 
but it takes 42 seconds (3.04 MBps) for all the 128 documents to reach 
the slave node.
Things get worse when I make a two way replication between the two 
nodes, this time the upload on the "master" node takes 7.4 seconds, but 
75 seconds are needed for the two nodes to become consistent. The erlang 
processes on both sides eat more resources, so this slowdown is 
completely visible, not network bound (of course).

If I make two one way replications (A->B, A->C node), the times look 
like this: time needed to upload on the master (A) node: 6.52s, time 
needed for the slave (B, C) nodes to get consistent with A: 44s (A->B), 
39s (A->C).
BTW, I calculate this from the start of the script (I'm not writing the 
data on A and then set up replication).

With the following replications defined: A<->B, A<->C, I get these: 
uploading to A: 7.34s, A->B consistency: 72s , A->C consistency: 72s
During the process I saw this on node A:
   PID USERNAME   THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
15427 couchdb     11 109    0   217M   149M CPU0    0  14:44 135.94% 
beam.smp
and this was after the upload has been done, so this is what CouchDB 
eats when doing bilateral replication towards two nodes.

And now the full mesh (A<->B, A<->C, B<->C):
CouchDB resource usage tops:
15427 couchdb     11 110    0   270M   202M CPU1    1  18:44 140.14% 
beam.smp

and the consistency times also: A->B: 125s, A->C: 107s.
BTW, the upload lasted for 7.59s.

Summary: it seems unilateral replication is consistent in it's resource 
usage, and it's pretty slow (7s on localhost write vs. 42s of 
replication to the remote node). If I define a bilateral replication it 
slows down further, nearly to the half. Every bilateral agreement 
introduces this slowdown, so one unilateral: 42s, one bilateral: 72s, 
two bilaterals: 125s.

I'm sure it's not about waiting for the network or disk, it seems to be 
pure resource usage problem. Is this known? Will it be fixed?

Thanks,

Mime
View raw message