incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Max Ogden <...@maxogden.com>
Subject Re: CouchDB Crash report db_not_found when attempting to replicate databases
Date Tue, 13 Sep 2011 19:30:09 GMT
Hi Chris,

after installing https://github.com/joyent/node and http://npmjs.org/ you
can simply do

npm install replicate

and then

replicate http://sourcecouch/db http://destinationcouch/db

it will simply return a 'success' message when it completes. there isn't any
progress monitoring output yet. there also isnt support
for continuous replication

or alternatively you could write custom node.js code for finer-grained
behavior.

cheers,

max

On Tue, Sep 13, 2011 at 12:19 PM, Chris Stockton
<chrisstocktonaz@gmail.com>wrote:

> Hello,
>
> On Tue, Sep 13, 2011 at 11:44 AM, Max Ogden <max@maxogden.com> wrote:
> > Hi Chris,
> >
> > From what I understand the current state of the replicator (as of 1.1) is
> > that for certain types of collections of documents it can be somewhat
> > fragile. In the case of the node.js package repository, http://npmjs.org
> ,
> > there are many relatively large (~100MB) documents that would sometimes
> > throw errors or timeout during replication and crash the replicator, at
> > which point the replicator would restart and attempt to pick up where it
> > left off. I am not an expert in the internals of the replicator but
> > apparently the cumulative time required for the replicator to repeatedly
> > crash and then subsequently relocate itself in _changes feed in the case
> of
> > replicating the node package manager was making the built in couch
> > replicator unusable for the task.
> >
>
> First of all I thank you for your response, I appreciate your time. We
> have had a rocky road with replication as well, everything from system
> limits to single document/view/reduce errors causing processes to
> spawn wildly crippling machines. We have slowly worked through them by
> upping system limits and erlang VM limits.
>
> I feel like the absolute root cause of our problem is that we scale
> via many smaller databases instead of a single large one. We are at
> about 4200 databases right now and its painful to netstat -nap|grep
> beam|wc -l and see 4200 active tcp connections. I have brought up
> suggestions and comments in the past about server wide replication,
> with some simple filtering function so a small pool of tcp connections
> and processes could be used, greatly improving our scaling pattern of
> many, small databases. I would be able to allocate time to try to
> contribute some kinda patch to do this, but I simply do not know
> erlang and it is very far from the languages I know (c, java, php,
> anything close to these.. erlang is a entirely different world)
>
> I have thought about changing our replication processes to only do
> single pass non-continuous replication, currently they manage and
> reconcile dropped replication tasks by monitoring status, using the
> continuous =true flag, but I may need to drop that at the cost of
> possible data loss if we get a crash in between passes.
>
> > Two solutions exist that I know of. There is a new replicator in trunk
> (not
> > to be confused with the _replicator db from 1.1 -- it is still using the
> old
> > replicator algorithms) and there is also a more reliable replicator
> written
> > in node.js https://github.com/mikeal/replicate that was was written
> > specifically to replicate the node package repository between hosting
> > providers.
> >
>
> Is there any documentation on this? Although I have heard good things
> I am not familiar with node.js, I am interested in any alternatives
> that better fit our use cases. At the end of the day stability, data
> consistency and reliability for our customers for me is the biggest
> concern, right now we don't have that and it's what I'm aiming for, no
> more 2AM noc phone calls is the goal! :- )
>
> > Additionally it may be useful if you could describe the 'fingerprint' of
> > your documents a bit. How many documents are in the failing databases?
> are
> > the documents large or small? do they have many attachments? how large is
> > your _changes feed?
> >
>
> The failing databases do not share a common signature, some are very
> small, maybe 10 total documents, some may have more then 10 thousand.
> Some have had no changes for a very long time, some are recent. The
> failures shared no common ground based off my observations.
>
> Additional info:
>  - We have around 4200 databases
>  - The typical document is under 2kb, they are basically "table"
> rows, simple key/value pairs
>  - The changes feed is pretty small on most databases experiencing issues
>  - We compact databases which had changes each night
>  - A small percent, like 10% has attachments, they seem to not be
> related to our issues
>
> I am going to look into some of the alternative replicators you have
> given me, feel free to give any specific suggestions based on the
> above info.
>
> Thanks,
>
> -Chris
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message