incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kowsik <kow...@gmail.com>
Subject Re: CouchDB Replication lacking resilience for many database
Date Tue, 11 Oct 2011 04:28:34 GMT
Chris,
You might want to read this:

http://blog.mudynamics.com/2011/09/05/help-couchdb-break-the-c10k-barrier/

Make sure that your default 'ulimit -n' is pretty high. Under heavy
load, I've seen the replicator getting "backed-up" and start consuming
precious RAM until it gets completely wedged. With 1.1, you also have
the replicator-giving-up syndrome which has now been fixed in trunk
(with infinite retries). We have background workers on blitz.io that
monitor the replicator task status and kicks them when they go into an
error state. A kludgy hack, but one that works pretty well in
production.

You might also want to add this to your local.ini

socket_options = [{recbuf, 262144}, {sndbuf, 262144}, {nodelay, true}]

which helps quite a bit with the _changes feed.

K.
---
http://blog.mudynamics.com
http://blitz.io
@pcapr

On Mon, Oct 10, 2011 at 9:03 PM, Chris Stockton
<chrisstocktonaz@gmail.com> wrote:
> Hello,
>
> On Mon, Oct 10, 2011 at 5:18 PM, Adam Kocoloski <kocolosk@apache.org> wrote:
>> On Oct 10, 2011, at 8:02 PM, Chris Stockton wrote:
>>
>>> Hello,
>>>
>>> On Mon, Oct 10, 2011 at 4:19 PM, Filipe David Manana
>>> <fdmanana@apache.org> wrote:
>>>> On Tue, Oct 11, 2011 at 12:03 AM, Chris Stockton
>>>> <chrisstocktonaz@gmail.com> wrote:
>>>> Chris,
>>>>
>>>> That said work is in the'1.2.x' branch (and master).
>>>> CouchDB recently migrated from SVN to GIT, see:
>>>> http://couchdb.apache.org/community/code.html
>>>>
>>>
>>> Thank you very much for the response Filipe, do you possibly have any
>>> documentation or more detailed summary on what these changes include
>>> and possible benefits of them? I would love to hear about any tweaking
>>> or replication tips you may have for our growth issues, perhaps you
>>> could answer a basic question if nothing else: Do the changes in this
>>> branch minimize the performance impact of continuous replication on
>>> many databases?
>>>
>>> Regardless I plan on getting a build of that branch and doing some
>>> testing of my own very soon.
>>>
>>> Thank you!
>>>
>>> -Chris
>>
>> I'm pretty sure that even in 1.2.x and master each replication with a remote source
still requires one dedicated TCP connection to consume the _changes feed.  Replications with
a local source have always been able to use a connection pool per host:port combination.  That's
not to downplay the significance of the rewrite of the replicator in 1.2.x; Filipe put quite
a lot of time into it.
>>
>> The link to "those darn errors" just pointed to the mbox browser for September 2011.
 Do you have a more specific link?  Regards,
>>
>> Adam
>
> Well I will remain optimistic that the rewrite could hopefully have
> solved several of my issues regardless I hope. I guess the idle TCP
> connections by themselves are not too bad, when they all start to work
> simultaneously I think is what becomes the issue =)
>
> Sorry Adam, here is a better link
> http://mail-archives.apache.org/mod_mbox/couchdb-user/201109.mbox/%3CCALKFbxuugLJJY-NH46U0u584L+XDqM3NGSpeNxsJyrxosPEuCg@mail.gmail.com%3E,
> the actual text was:
>
> ---------------
>
> It seems that randomly I am getting errors about crashes as our
> replicator runs, all this replicator does is make sure that all
> databases on the master server replicate to our failover by checking
> status.
>
> Details:
>  - I notice the below error in the logs, anywhere from 0 to 30 at a time.
>  - It seems that a database might start replicating okay then stop.
>  - These errors [1] are on the failover pulling from master
>  - No errors are displayed on the master server
>  - The databases inside the URL in the db_not_found portion of the
> error, are always available from curl from the failover machine, which
> makes the error strange, somehow it thinks it can't find the database
>  - Master seems healthy at all times, all database are available, no
> errors in log
>
> [1] --
>  [Mon, 12 Sep 2011 18:34:14 GMT] [error] [<0.22466.5305>]
> {error_report,<0.30.0>,
>                          {<0.22466.5305>,crash_report,
>                           [[{initial_call,{couch_rep,init,['Argument__1']}},
>                             {pid,<0.22466.5305>},
>                             {registered_name,[]},
>                             {error_info,
>                              {exit,
>                               {db_not_found,
>                                <<"http://user:pass@server:5984/db_10944/">>},
>                               [{gen_server,init_it,6},
>                                {proc_lib,init_p_do_apply,3}]}},
>                             {ancestors,
>                              [couch_rep_sup,couch_primary_services,
>                               couch_server_sup,<0.31.0>]},
>                             {messages,[]},
>                             {links,[<0.81.0>]},
>                             {dictionary,[]},
>                             {trap_exit,true},
>                             {status,running},
>                             {heap_size,2584},
>                             {stack_size,24},
>                             {reductions,794}],
>                            []]}}
>

Mime
View raw message