couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: CouchDB Replication lacking resilience for many database
Date Tue, 11 Oct 2011 05:15:51 GMT
On Mon, Oct 10, 2011 at 11:03 PM, Chris Stockton
<chrisstocktonaz@gmail.com> wrote:
> Hello,
>
> On Mon, Oct 10, 2011 at 5:18 PM, Adam Kocoloski <kocolosk@apache.org> wrote:
>> On Oct 10, 2011, at 8:02 PM, Chris Stockton wrote:
>>
>>> Hello,
>>>
>>> On Mon, Oct 10, 2011 at 4:19 PM, Filipe David Manana
>>> <fdmanana@apache.org> wrote:
>>>> On Tue, Oct 11, 2011 at 12:03 AM, Chris Stockton
>>>> <chrisstocktonaz@gmail.com> wrote:
>>>> Chris,
>>>>
>>>> That said work is in the'1.2.x' branch (and master).
>>>> CouchDB recently migrated from SVN to GIT, see:
>>>> http://couchdb.apache.org/community/code.html
>>>>
>>>
>>> Thank you very much for the response Filipe, do you possibly have any
>>> documentation or more detailed summary on what these changes include
>>> and possible benefits of them? I would love to hear about any tweaking
>>> or replication tips you may have for our growth issues, perhaps you
>>> could answer a basic question if nothing else: Do the changes in this
>>> branch minimize the performance impact of continuous replication on
>>> many databases?
>>>
>>> Regardless I plan on getting a build of that branch and doing some
>>> testing of my own very soon.
>>>
>>> Thank you!
>>>
>>> -Chris
>>
>> I'm pretty sure that even in 1.2.x and master each replication with a remote source
still requires one dedicated TCP connection to consume the _changes feed.  Replications with
a local source have always been able to use a connection pool per host:port combination.  That's
not to downplay the significance of the rewrite of the replicator in 1.2.x; Filipe put quite
a lot of time into it.
>>
>> The link to "those darn errors" just pointed to the mbox browser for September 2011.
 Do you have a more specific link?  Regards,
>>
>> Adam
>
> Well I will remain optimistic that the rewrite could hopefully have
> solved several of my issues regardless I hope. I guess the idle TCP
> connections by themselves are not too bad, when they all start to work
> simultaneously I think is what becomes the issue =)
>
> Sorry Adam, here is a better link
> http://mail-archives.apache.org/mod_mbox/couchdb-user/201109.mbox/%3CCALKFbxuugLJJY-NH46U0u584L+XDqM3NGSpeNxsJyrxosPEuCg@mail.gmail.com%3E,
> the actual text was:
>
> ---------------
>
> It seems that randomly I am getting errors about crashes as our
> replicator runs, all this replicator does is make sure that all
> databases on the master server replicate to our failover by checking
> status.
>
> Details:
>  - I notice the below error in the logs, anywhere from 0 to 30 at a time.
>  - It seems that a database might start replicating okay then stop.
>  - These errors [1] are on the failover pulling from master
>  - No errors are displayed on the master server
>  - The databases inside the URL in the db_not_found portion of the
> error, are always available from curl from the failover machine, which
> makes the error strange, somehow it thinks it can't find the database
>  - Master seems healthy at all times, all database are available, no
> errors in log
>
> [1] --
>  [Mon, 12 Sep 2011 18:34:14 GMT] [error] [<0.22466.5305>]
> {error_report,<0.30.0>,
>                          {<0.22466.5305>,crash_report,
>                           [[{initial_call,{couch_rep,init,['Argument__1']}},
>                             {pid,<0.22466.5305>},
>                             {registered_name,[]},
>                             {error_info,
>                              {exit,
>                               {db_not_found,
>                                <<"http://user:pass@server:5984/db_10944/">>},
>                               [{gen_server,init_it,6},
>                                {proc_lib,init_p_do_apply,3}]}},
>                             {ancestors,
>                              [couch_rep_sup,couch_primary_services,
>                               couch_server_sup,<0.31.0>]},
>                             {messages,[]},
>                             {links,[<0.81.0>]},
>                             {dictionary,[]},
>                             {trap_exit,true},
>                             {status,running},
>                             {heap_size,2584},
>                             {stack_size,24},
>                             {reductions,794}],
>                            []]}}
>

One place I've seen this error pop up when it looks like it shouldn't
is if couch_server gets backed up. If you remsh into one of those db's
you could try the following:

    > process_info(whereis(couch_server), message_queue_len).

And if that number keeps growing, that could be the issue.

Mime
View raw message