couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wayne Conrad <wa...@databill.com>
Subject Re: Replication: stalled?
Date Tue, 01 Mar 2011 02:41:16 GMT
On 02/21/2011 11:18 AM, Wayne Conrad wrote:
> I'm seeing replication behavior that I don't understand.  I wonder if 
> it's stalled.

It's not stalled.  It's going very, very slowly.  I think I understand why.

Some of my documents have tens of thousands of attachments.  When I 
first started storing the fat documents in couchdb, it took half an hour 
or more to add them.  To make it faster, and to prevent timeouts, I 
store the attachments inline, but in chunks of 100 attachments at a 
time.  Doing that, even my largest documents take only a minute or so to 
store.

I can store a document with 32,768 attachments of 4k each in 55 seconds 
(2.4k/sec).  But to replicate that document (using "pull" replication) 
takes 19.5 minutes.  That's 115k per second.  Storing, then, is 20 times 
faster than replicating.  When I look at the log on the source database, 
I see that the destination database is retrieving one attachment at a 
time, and (I presume) experiencing the same speed problem that caused me 
to write my "store bunches of attachments at a time" optimization.  Now 
it seems that, in order for replication to have any chance of keeping up 
with the rate at which I can store data, I'm going to need the same sort 
of optimization during replication.

I'm a couch toddler, and when it comes to Erlang, I'm not even on solid 
food yet.  What are the odds of me writing my own replication engine in, 
say, Ruby, one that can do the special optimizations I need?  How 
difficult a project is it?


Mime
View raw message