incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hahn <m...@boutiquing.com>
Subject Re: CouchDB Replication lacking resilience for many database
Date Tue, 11 Oct 2011 12:20:55 GMT
It would be nice to have a control panel that displays things like this
message queue depth, connection counts, memory consumed, cpu consumed,
reads/writes per second, view rebuilds/sec, avg response times, etc.  I'm
sure someone could come up with many more pertinent vars.

For extra credit the values could be plotted against time.  When someone has
a problem they could post the log here.

On Mon, Oct 10, 2011 at 10:15 PM, Paul Davis <paul.joseph.davis@gmail.com>wrote:

> On Mon, Oct 10, 2011 at 11:03 PM, Chris Stockton
> <chrisstocktonaz@gmail.com> wrote:
> > Hello,
> >
> > On Mon, Oct 10, 2011 at 5:18 PM, Adam Kocoloski <kocolosk@apache.org>
> wrote:
> >> On Oct 10, 2011, at 8:02 PM, Chris Stockton wrote:
> >>
> >>> Hello,
> >>>
> >>> On Mon, Oct 10, 2011 at 4:19 PM, Filipe David Manana
> >>> <fdmanana@apache.org> wrote:
> >>>> On Tue, Oct 11, 2011 at 12:03 AM, Chris Stockton
> >>>> <chrisstocktonaz@gmail.com> wrote:
> >>>> Chris,
> >>>>
> >>>> That said work is in the'1.2.x' branch (and master).
> >>>> CouchDB recently migrated from SVN to GIT, see:
> >>>> http://couchdb.apache.org/community/code.html
> >>>>
> >>>
> >>> Thank you very much for the response Filipe, do you possibly have any
> >>> documentation or more detailed summary on what these changes include
> >>> and possible benefits of them? I would love to hear about any tweaking
> >>> or replication tips you may have for our growth issues, perhaps you
> >>> could answer a basic question if nothing else: Do the changes in this
> >>> branch minimize the performance impact of continuous replication on
> >>> many databases?
> >>>
> >>> Regardless I plan on getting a build of that branch and doing some
> >>> testing of my own very soon.
> >>>
> >>> Thank you!
> >>>
> >>> -Chris
> >>
> >> I'm pretty sure that even in 1.2.x and master each replication with a
> remote source still requires one dedicated TCP connection to consume the
> _changes feed.  Replications with a local source have always been able to
> use a connection pool per host:port combination.  That's not to downplay the
> significance of the rewrite of the replicator in 1.2.x; Filipe put quite a
> lot of time into it.
> >>
> >> The link to "those darn errors" just pointed to the mbox browser for
> September 2011.  Do you have a more specific link?  Regards,
> >>
> >> Adam
> >
> > Well I will remain optimistic that the rewrite could hopefully have
> > solved several of my issues regardless I hope. I guess the idle TCP
> > connections by themselves are not too bad, when they all start to work
> > simultaneously I think is what becomes the issue =)
> >
> > Sorry Adam, here is a better link
> >
> http://mail-archives.apache.org/mod_mbox/couchdb-user/201109.mbox/%3CCALKFbxuugLJJY-NH46U0u584L+XDqM3NGSpeNxsJyrxosPEuCg@mail.gmail.com%3E
> ,
> > the actual text was:
> >
> > ---------------
> >
> > It seems that randomly I am getting errors about crashes as our
> > replicator runs, all this replicator does is make sure that all
> > databases on the master server replicate to our failover by checking
> > status.
> >
> > Details:
> >  - I notice the below error in the logs, anywhere from 0 to 30 at a time.
> >  - It seems that a database might start replicating okay then stop.
> >  - These errors [1] are on the failover pulling from master
> >  - No errors are displayed on the master server
> >  - The databases inside the URL in the db_not_found portion of the
> > error, are always available from curl from the failover machine, which
> > makes the error strange, somehow it thinks it can't find the database
> >  - Master seems healthy at all times, all database are available, no
> > errors in log
> >
> > [1] --
> >  [Mon, 12 Sep 2011 18:34:14 GMT] [error] [<0.22466.5305>]
> > {error_report,<0.30.0>,
> >                          {<0.22466.5305>,crash_report,
> >
> [[{initial_call,{couch_rep,init,['Argument__1']}},
> >                             {pid,<0.22466.5305>},
> >                             {registered_name,[]},
> >                             {error_info,
> >                              {exit,
> >                               {db_not_found,
> >                                <<"http://user:pass@server
> :5984/db_10944/">>},
> >                               [{gen_server,init_it,6},
> >                                {proc_lib,init_p_do_apply,3}]}},
> >                             {ancestors,
> >                              [couch_rep_sup,couch_primary_services,
> >                               couch_server_sup,<0.31.0>]},
> >                             {messages,[]},
> >                             {links,[<0.81.0>]},
> >                             {dictionary,[]},
> >                             {trap_exit,true},
> >                             {status,running},
> >                             {heap_size,2584},
> >                             {stack_size,24},
> >                             {reductions,794}],
> >                            []]}}
> >
>
> One place I've seen this error pop up when it looks like it shouldn't
> is if couch_server gets backed up. If you remsh into one of those db's
> you could try the following:
>
>    > process_info(whereis(couch_server), message_queue_len).
>
> And if that number keeps growing, that could be the issue.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message