On Mon, Oct 10, 2011 at 11:03 PM, Chris Stockton wrote: > Hello, > > On Mon, Oct 10, 2011 at 5:18 PM, Adam Kocoloski wrote: >> On Oct 10, 2011, at 8:02 PM, Chris Stockton wrote: >> >>> Hello, >>> >>> On Mon, Oct 10, 2011 at 4:19 PM, Filipe David Manana >>> wrote: >>>> On Tue, Oct 11, 2011 at 12:03 AM, Chris Stockton >>>> wrote: >>>> Chris, >>>> >>>> That said work is in the'1.2.x' branch (and master). >>>> CouchDB recently migrated from SVN to GIT, see: >>>> http://couchdb.apache.org/community/code.html >>>> >>> >>> Thank you very much for the response Filipe, do you possibly have any >>> documentation or more detailed summary on what these changes include >>> and possible benefits of them? I would love to hear about any tweaking >>> or replication tips you may have for our growth issues, perhaps you >>> could answer a basic question if nothing else: Do the changes in this >>> branch minimize the performance impact of continuous replication on >>> many databases? >>> >>> Regardless I plan on getting a build of that branch and doing some >>> testing of my own very soon. >>> >>> Thank you! >>> >>> -Chris >> >> I'm pretty sure that even in 1.2.x and master each replication with a remote source still requires one dedicated TCP connection to consume the _changes feed.  Replications with a local source have always been able to use a connection pool per host:port combination.  That's not to downplay the significance of the rewrite of the replicator in 1.2.x; Filipe put quite a lot of time into it. >> >> The link to "those darn errors" just pointed to the mbox browser for September 2011.  Do you have a more specific link?  Regards, >> >> Adam > > Well I will remain optimistic that the rewrite could hopefully have > solved several of my issues regardless I hope. I guess the idle TCP > connections by themselves are not too bad, when they all start to work > simultaneously I think is what becomes the issue =) > > Sorry Adam, here is a better link > http://mail-archives.apache.org/mod_mbox/couchdb-user/201109.mbox/%3CCALKFbxuugLJJY-NH46U0u584L+XDqM3NGSpeNxsJyrxosPEuCg@mail.gmail.com%3E, > the actual text was: > > --------------- > > It seems that randomly I am getting errors about crashes as our > replicator runs, all this replicator does is make sure that all > databases on the master server replicate to our failover by checking > status. > > Details: >  - I notice the below error in the logs, anywhere from 0 to 30 at a time. >  - It seems that a database might start replicating okay then stop. >  - These errors [1] are on the failover pulling from master >  - No errors are displayed on the master server >  - The databases inside the URL in the db_not_found portion of the > error, are always available from curl from the failover machine, which > makes the error strange, somehow it thinks it can't find the database >  - Master seems healthy at all times, all database are available, no > errors in log > > [1] -- >  [Mon, 12 Sep 2011 18:34:14 GMT] [error] [<0.22466.5305>] > {error_report,<0.30.0>, >                          {<0.22466.5305>,crash_report, >                           [[{initial_call,{couch_rep,init,['Argument__1']}}, >                             {pid,<0.22466.5305>}, >                             {registered_name,[]}, >                             {error_info, >                              {exit, >                               {db_not_found, >                                <<"http://user:pass@server:5984/db_10944/">>}, >                               [{gen_server,init_it,6}, >                                {proc_lib,init_p_do_apply,3}]}}, >                             {ancestors, >                              [couch_rep_sup,couch_primary_services, >                               couch_server_sup,<0.31.0>]}, >                             {messages,[]}, >                             {links,[<0.81.0>]}, >                             {dictionary,[]}, >                             {trap_exit,true}, >                             {status,running}, >                             {heap_size,2584}, >                             {stack_size,24}, >                             {reductions,794}], >                            []]}} > One place I've seen this error pop up when it looks like it shouldn't is if couch_server gets backed up. If you remsh into one of those db's you could try the following: > process_info(whereis(couch_server), message_queue_len). And if that number keeps growing, that could be the issue.