couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: CouchDB Replication lacking resilience for many database
Date Tue, 11 Oct 2011 14:03:21 GMT

On Oct 11, 2011, at 14:20 , Mark Hahn wrote:

> It would be nice to have a control panel that displays things like this
> message queue depth, connection counts, memory consumed, cpu consumed,
> reads/writes per second, view rebuilds/sec, avg response times, etc.  I'm
> sure someone could come up with many more pertinent vars.
> 
> For extra credit the values could be plotted against time.  When someone has
> a problem they could post the log here.

See /_stats :)

It doesn't have all the things you ask for, but adding new stats isn't hard: 

  http://wiki.apache.org/couchdb/Adding_Runtime_Statistics

Cheers
Jan
-- 



> 
> On Mon, Oct 10, 2011 at 10:15 PM, Paul Davis <paul.joseph.davis@gmail.com>wrote:
> 
>> On Mon, Oct 10, 2011 at 11:03 PM, Chris Stockton
>> <chrisstocktonaz@gmail.com> wrote:
>>> Hello,
>>> 
>>> On Mon, Oct 10, 2011 at 5:18 PM, Adam Kocoloski <kocolosk@apache.org>
>> wrote:
>>>> On Oct 10, 2011, at 8:02 PM, Chris Stockton wrote:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> On Mon, Oct 10, 2011 at 4:19 PM, Filipe David Manana
>>>>> <fdmanana@apache.org> wrote:
>>>>>> On Tue, Oct 11, 2011 at 12:03 AM, Chris Stockton
>>>>>> <chrisstocktonaz@gmail.com> wrote:
>>>>>> Chris,
>>>>>> 
>>>>>> That said work is in the'1.2.x' branch (and master).
>>>>>> CouchDB recently migrated from SVN to GIT, see:
>>>>>> http://couchdb.apache.org/community/code.html
>>>>>> 
>>>>> 
>>>>> Thank you very much for the response Filipe, do you possibly have any
>>>>> documentation or more detailed summary on what these changes include
>>>>> and possible benefits of them? I would love to hear about any tweaking
>>>>> or replication tips you may have for our growth issues, perhaps you
>>>>> could answer a basic question if nothing else: Do the changes in this
>>>>> branch minimize the performance impact of continuous replication on
>>>>> many databases?
>>>>> 
>>>>> Regardless I plan on getting a build of that branch and doing some
>>>>> testing of my own very soon.
>>>>> 
>>>>> Thank you!
>>>>> 
>>>>> -Chris
>>>> 
>>>> I'm pretty sure that even in 1.2.x and master each replication with a
>> remote source still requires one dedicated TCP connection to consume the
>> _changes feed.  Replications with a local source have always been able to
>> use a connection pool per host:port combination.  That's not to downplay the
>> significance of the rewrite of the replicator in 1.2.x; Filipe put quite a
>> lot of time into it.
>>>> 
>>>> The link to "those darn errors" just pointed to the mbox browser for
>> September 2011.  Do you have a more specific link?  Regards,
>>>> 
>>>> Adam
>>> 
>>> Well I will remain optimistic that the rewrite could hopefully have
>>> solved several of my issues regardless I hope. I guess the idle TCP
>>> connections by themselves are not too bad, when they all start to work
>>> simultaneously I think is what becomes the issue =)
>>> 
>>> Sorry Adam, here is a better link
>>> 
>> http://mail-archives.apache.org/mod_mbox/couchdb-user/201109.mbox/%3CCALKFbxuugLJJY-NH46U0u584L+XDqM3NGSpeNxsJyrxosPEuCg@mail.gmail.com%3E
>> ,
>>> the actual text was:
>>> 
>>> ---------------
>>> 
>>> It seems that randomly I am getting errors about crashes as our
>>> replicator runs, all this replicator does is make sure that all
>>> databases on the master server replicate to our failover by checking
>>> status.
>>> 
>>> Details:
>>> - I notice the below error in the logs, anywhere from 0 to 30 at a time.
>>> - It seems that a database might start replicating okay then stop.
>>> - These errors [1] are on the failover pulling from master
>>> - No errors are displayed on the master server
>>> - The databases inside the URL in the db_not_found portion of the
>>> error, are always available from curl from the failover machine, which
>>> makes the error strange, somehow it thinks it can't find the database
>>> - Master seems healthy at all times, all database are available, no
>>> errors in log
>>> 
>>> [1] --
>>> [Mon, 12 Sep 2011 18:34:14 GMT] [error] [<0.22466.5305>]
>>> {error_report,<0.30.0>,
>>>                         {<0.22466.5305>,crash_report,
>>> 
>> [[{initial_call,{couch_rep,init,['Argument__1']}},
>>>                            {pid,<0.22466.5305>},
>>>                            {registered_name,[]},
>>>                            {error_info,
>>>                             {exit,
>>>                              {db_not_found,
>>>                               <<"http://user:pass@server
>> :5984/db_10944/">>},
>>>                              [{gen_server,init_it,6},
>>>                               {proc_lib,init_p_do_apply,3}]}},
>>>                            {ancestors,
>>>                             [couch_rep_sup,couch_primary_services,
>>>                              couch_server_sup,<0.31.0>]},
>>>                            {messages,[]},
>>>                            {links,[<0.81.0>]},
>>>                            {dictionary,[]},
>>>                            {trap_exit,true},
>>>                            {status,running},
>>>                            {heap_size,2584},
>>>                            {stack_size,24},
>>>                            {reductions,794}],
>>>                           []]}}
>>> 
>> 
>> One place I've seen this error pop up when it looks like it shouldn't
>> is if couch_server gets backed up. If you remsh into one of those db's
>> you could try the following:
>> 
>>> process_info(whereis(couch_server), message_queue_len).
>> 
>> And if that number keeps growing, that could be the issue.
>> 


Mime
View raw message