incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wayne Conrad <>
Subject How to tell if replication is caught up?
Date Tue, 22 Mar 2011 19:27:30 GMT
My largest, ~600GB database was awful to compact.  Because much of it 
seldom changes, I shared that database by account, yielding about 500 
databases of various sizes.  With a compaction daemon that only compacts 
a database when it grows, compaction is no longer a problem.  However, I 
appear to be suffering now when it comes to replication.

Five hundred continuous "pull" replications have the destination 
database crying for mercy.  Its four CPUs are continously busy (load 
average ~4) and requests to the destination database occasionally time out.

The replication script starts a "pull" replication for each database, 
one at a time.  The replication requests start out taking about 0.3 
seconds per database, but towards the end of the list each reques is 
taking many seconds.

Shortly after the replication starts, before it's got past more than a 
few dozen database, there is a brief flood of stack traces (or whatever 
Erlang calls them) in the destination couch log.  I think there are 
fewer lines of error info than there are atoms in the sun, but only 
just.  Is there a guide that can help me know which lines of that log 
you need to know?

The source database is not suffering: It's load average is < 1 and it 
serves requests quickly.

Due to the number of databases, I've added "ulimit -n 32768" to the 
startup script.

We're running version 1.2.0ac052866-git on linux 2.6.32.  This version 
has the new replicator.

* Are we "doing it all wrong?"

* Can I expect the storm to abate once all of the replications are 
caught up?

* How can I tell which replications are "caught up?"  I see that a GET 
to /_active_tasks tells me that some replication tasks are "Starting" 
and others have, e.g., "Processed source seq 17", but I don't know if 
this is enough to know what's caught up and what's not.  Do I have to 
query the source database somehow to find out what source sequence is 

Best Regards,
Wayne Conrad

View raw message