incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Blair Zajac <bl...@orcaware.com>
Subject Re: Replication and new user questions
Date Wed, 26 Aug 2009 20:33:49 GMT
Hi Adam,

Thanks for the quick reply, I appreciate it.

I neglected to mention that we're running trunk r807360.

Replies inline below.

Adam Kocoloski wrote:
> Hi Blair, all good questions, I'll try to answer inline:
> 
> On Aug 25, 2009, at 5:10 PM, Blair Zajac wrote:
> 
>> 1) What's the most robust automatic replication mechanism?  While 
>> continuous replication looks nice, I see there's some tickets open 
>> with it and that it has issues with four nodes.  Is a more robust 
>> solution, but a little slower and heavier, it to have an 
>> update_notification that manually POSTs to _replicate?
> 
> We're committed to making continuous replication as robust and 
> performant as possible.  The entire replication codebase went through a 
> significant refactoring after 0.9, and what you're seeing is us ironing 
> out a few of the kinks before 0.10 gets out the door.  I'd encourage you 
> to give "continuous":true a shot, provided my answer to 2) isn't a 
> deal-breaker.

No, 2) isn't a deal breaker.

>> Will there be a way to manage the list of replicant databases when the 
>> persistent continuous replication feature is complete?
> 
> Absolutely yes.  It will probably be a special DB called _replication 
> where you can PUT and DELETE documents that configure continuous 
> replications.

That's great.

Is there a wiki or place that CouchDB keeps its design documents for new 
features for people to learn about, e.g. a ticket or checked into svn as a text 
file?

>> 3) How does continuous replication deal with network outages, say if a 
>> link goes down between the Los Angeles and Bristol data centers?  Does 
>> CouchDB deal with a hanging TCP connection ok?
> 
> CouchDB retries requests using a timeout that doubles with every 
> failure.  It does this for about 5 minutes, then gives up.

That sounds like it would still then require an external script to start the 
replication again.

In fact, our Bristol office had a power outage earlier today that lasted over an 
hour, so to write a script to kick start replication again would be inconvenient.

>> 5) I wrote the following Bourne shell script and after running it for 
>> an hour, it consumes 100% of a CPU.  This is even after stopping the 
>> shell script and compacting both databases.  What would explain this 
>> behavior?
> 
> I couldn't quite get that script to work ($HOST2 was undefined, and then 
> something else failed), but can you try it again with a fresh checkout?  
> I fixed a bug last night that could very well have caused this.  Best,

I've attached the latest version of the script which I just ran.

After multiple runs of the script and letting it run indefinitely, I've noticed 
that something will fail in CouchDB and the script will either wait forever for 
the key to appear in the other database or a PUT will fail.

The last error I got was that using curl and PUT returned nothing and this error 
in my shell.  It scrolled past the top, so I don't have the top of the stack:

** Reason for termination ==
** changes_loop_died
[error] [<0.144.2060>] {error_report,<0.23.0>,
               {<0.144.2060>,crash_report,
                [[{initial_call,{couch_rep,init,['Argument__1']}},
                  {pid,<0.144.2060>},
                  {registered_name,[]},
                  {error_info,{exit,changes_loop_died,
                                    [{gen_server,terminate,6},
                                     {proc_lib,init_p_do_apply,3}]}},
                  {ancestors,[couch_rep_sup,couch_primary_services,
                              couch_server_sup,<0.1.0>]},
                  {messages,[]},
                  {links,[<0.162.2060>,<0.164.2060>,<0.103.2060>,<0.160.2060>]},
                  {dictionary,[{task_status_update,{{1251,317928,943764},0}}]},
                  {trap_exit,true},
                  {status,running},
                  {heap_size,2584},
                  {stack_size,24},
                  {reductions,2630900}],
                 [{neighbour,[{pid,<0.164.2060>},
                              {registered_name,[]},
                              {initial_call,{erlang,apply,2}},
                              {current_function,{gen,wait_resp_mon,3}},
                              {ancestors,[]},
                              {messages,[]},
                              {links,[<0.144.2060>]},
                              {dictionary,[]},
                              {trap_exit,false},
                              {status,waiting},
                              {heap_size,987},
                              {stack_size,17},
                              {reductions,844815}]}]]}}
[error] [<0.160.2060>] ** Generic server <0.160.2060> terminating
** Last message in was {'EXIT',<0.144.2060>,changes_loop_died}
** When Server state == {state,<0.161.2060>,nil,
                             {db,<0.153.2060>,<0.154.2060>,nil,
                                 <<"1251313203438699">>,<0.151.2060>,
                                 <0.155.2060>,
                                 {db_header,4,44914,0,
                                     {272424021,{6,12656}},
                                     {272425998,12662},
                                     {272398470,[]},
                                     0,nil,nil,1000},
                                 44914,
                                 {btree,<0.151.2060>,
                                     {272424021,{6,12656}},
                                     #Fun<couch_db_updater.8.117532479>,
                                     #Fun<couch_db_updater.9.105507025>,
                                     #Fun<couch_db_updater.7.32442936>,
                                     #Fun<couch_db_updater.10.43662179>},
                                 {btree,<0.151.2060>,
                                     {272425998,12662},
                                     #Fun<couch_db_updater.11.41695917>,
                                     #Fun<couch_db_updater.12.6934644>,
                                     #Fun<couch_btree.5.124754102>,
                                     #Fun<couch_db_updater.13.28245598>},
                                 {btree,<0.151.2060>,
                                     {272398470,[]},
                                     #Fun<couch_btree.0.83553141>,
                                     #Fun<couch_btree.1.30790806>,
                                     #Fun<couch_btree.2.124754102>,nil},
                                 44914,<<"db2">>,
 
"/tmp/blair/couchdb.git-3/etc/couchdb/../../tmp/lib/db2.couch",
                                 [],[],nil,
                                 {user_ctx,null,[<<"_admin">>]},
                                 nil,1000,
                                 [before_header,after_header,on_file_open]},
                             <0.144.2060>,false,0,
                             {<0.163.2060>,#Ref<0.0.54.110898>},
                             {[],[]},
                             57374,57374,57374}
** Reason for termination ==
** changes_loop_died

Regards,
Blair


Mime
View raw message