incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hahn <m...@boutiquing.com>
Subject Re: CouchDB Replication lacking resilience for many database
Date Tue, 11 Oct 2011 19:07:48 GMT
cool.  Thanks.

On Tue, Oct 11, 2011 at 7:03 AM, Jan Lehnardt <jan@apache.org> wrote:

>
> On Oct 11, 2011, at 14:20 , Mark Hahn wrote:
>
> > It would be nice to have a control panel that displays things like this
> > message queue depth, connection counts, memory consumed, cpu consumed,
> > reads/writes per second, view rebuilds/sec, avg response times, etc.  I'm
> > sure someone could come up with many more pertinent vars.
> >
> > For extra credit the values could be plotted against time.  When someone
> has
> > a problem they could post the log here.
>
> See /_stats :)
>
> It doesn't have all the things you ask for, but adding new stats isn't
> hard:
>
>  http://wiki.apache.org/couchdb/Adding_Runtime_Statistics
>
> Cheers
> Jan
> --
>
>
>
> >
> > On Mon, Oct 10, 2011 at 10:15 PM, Paul Davis <
> paul.joseph.davis@gmail.com>wrote:
> >
> >> On Mon, Oct 10, 2011 at 11:03 PM, Chris Stockton
> >> <chrisstocktonaz@gmail.com> wrote:
> >>> Hello,
> >>>
> >>> On Mon, Oct 10, 2011 at 5:18 PM, Adam Kocoloski <kocolosk@apache.org>
> >> wrote:
> >>>> On Oct 10, 2011, at 8:02 PM, Chris Stockton wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> On Mon, Oct 10, 2011 at 4:19 PM, Filipe David Manana
> >>>>> <fdmanana@apache.org> wrote:
> >>>>>> On Tue, Oct 11, 2011 at 12:03 AM, Chris Stockton
> >>>>>> <chrisstocktonaz@gmail.com> wrote:
> >>>>>> Chris,
> >>>>>>
> >>>>>> That said work is in the'1.2.x' branch (and master).
> >>>>>> CouchDB recently migrated from SVN to GIT, see:
> >>>>>> http://couchdb.apache.org/community/code.html
> >>>>>>
> >>>>>
> >>>>> Thank you very much for the response Filipe, do you possibly have
any
> >>>>> documentation or more detailed summary on what these changes include
> >>>>> and possible benefits of them? I would love to hear about any
> tweaking
> >>>>> or replication tips you may have for our growth issues, perhaps
you
> >>>>> could answer a basic question if nothing else: Do the changes in
this
> >>>>> branch minimize the performance impact of continuous replication
on
> >>>>> many databases?
> >>>>>
> >>>>> Regardless I plan on getting a build of that branch and doing some
> >>>>> testing of my own very soon.
> >>>>>
> >>>>> Thank you!
> >>>>>
> >>>>> -Chris
> >>>>
> >>>> I'm pretty sure that even in 1.2.x and master each replication with
a
> >> remote source still requires one dedicated TCP connection to consume the
> >> _changes feed.  Replications with a local source have always been able
> to
> >> use a connection pool per host:port combination.  That's not to downplay
> the
> >> significance of the rewrite of the replicator in 1.2.x; Filipe put quite
> a
> >> lot of time into it.
> >>>>
> >>>> The link to "those darn errors" just pointed to the mbox browser for
> >> September 2011.  Do you have a more specific link?  Regards,
> >>>>
> >>>> Adam
> >>>
> >>> Well I will remain optimistic that the rewrite could hopefully have
> >>> solved several of my issues regardless I hope. I guess the idle TCP
> >>> connections by themselves are not too bad, when they all start to work
> >>> simultaneously I think is what becomes the issue =)
> >>>
> >>> Sorry Adam, here is a better link
> >>>
> >>
> http://mail-archives.apache.org/mod_mbox/couchdb-user/201109.mbox/%3CCALKFbxuugLJJY-NH46U0u584L+XDqM3NGSpeNxsJyrxosPEuCg@mail.gmail.com%3E
> >> ,
> >>> the actual text was:
> >>>
> >>> ---------------
> >>>
> >>> It seems that randomly I am getting errors about crashes as our
> >>> replicator runs, all this replicator does is make sure that all
> >>> databases on the master server replicate to our failover by checking
> >>> status.
> >>>
> >>> Details:
> >>> - I notice the below error in the logs, anywhere from 0 to 30 at a
> time.
> >>> - It seems that a database might start replicating okay then stop.
> >>> - These errors [1] are on the failover pulling from master
> >>> - No errors are displayed on the master server
> >>> - The databases inside the URL in the db_not_found portion of the
> >>> error, are always available from curl from the failover machine, which
> >>> makes the error strange, somehow it thinks it can't find the database
> >>> - Master seems healthy at all times, all database are available, no
> >>> errors in log
> >>>
> >>> [1] --
> >>> [Mon, 12 Sep 2011 18:34:14 GMT] [error] [<0.22466.5305>]
> >>> {error_report,<0.30.0>,
> >>>                         {<0.22466.5305>,crash_report,
> >>>
> >> [[{initial_call,{couch_rep,init,['Argument__1']}},
> >>>                            {pid,<0.22466.5305>},
> >>>                            {registered_name,[]},
> >>>                            {error_info,
> >>>                             {exit,
> >>>                              {db_not_found,
> >>>                               <<"http://user:pass@server
> >> :5984/db_10944/">>},
> >>>                              [{gen_server,init_it,6},
> >>>                               {proc_lib,init_p_do_apply,3}]}},
> >>>                            {ancestors,
> >>>                             [couch_rep_sup,couch_primary_services,
> >>>                              couch_server_sup,<0.31.0>]},
> >>>                            {messages,[]},
> >>>                            {links,[<0.81.0>]},
> >>>                            {dictionary,[]},
> >>>                            {trap_exit,true},
> >>>                            {status,running},
> >>>                            {heap_size,2584},
> >>>                            {stack_size,24},
> >>>                            {reductions,794}],
> >>>                           []]}}
> >>>
> >>
> >> One place I've seen this error pop up when it looks like it shouldn't
> >> is if couch_server gets backed up. If you remsh into one of those db's
> >> you could try the following:
> >>
> >>> process_info(whereis(couch_server), message_queue_len).
> >>
> >> And if that number keeps growing, that could be the issue.
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message