Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 74380 invoked from network); 14 Mar 2011 21:46:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Mar 2011 21:46:51 -0000 Received: (qmail 45420 invoked by uid 500); 14 Mar 2011 21:46:50 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 45381 invoked by uid 500); 14 Mar 2011 21:46:50 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 45373 invoked by uid 99); 14 Mar 2011 21:46:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Mar 2011 21:46:50 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=FREEMAIL_FROM,FS_REPLICA,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of randall.leeds@gmail.com designates 209.85.210.180 as permitted sender) Received: from [209.85.210.180] (HELO mail-iy0-f180.google.com) (209.85.210.180) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Mar 2011 21:46:43 +0000 Received: by iyf40 with SMTP id 40so7063899iyf.11 for ; Mon, 14 Mar 2011 14:46:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=RI6AI5eSlcNFug+GzogVNTCZbbDYpCKg12jFOJF05Jc=; b=GrxDdn4O4LrkGbYuch54rHfjGzPzWqcBjadR27sZ+4jxpstiuKkeYX5QneERoJF3wV OX/OemfdVLPy9/4mqzfWznBHy6pQoS0A/xQ0cAZJLslCmsnUlxDVWyW9u7HFim+PFpTo oRylyusilBzfM4LhUi1HFOUzC5UQ+JhZsTLaU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=SuMVSURMk9qheZRJdCTs69nP/EWRMb5BguhGKTEe8357PKqsPwCdnLjX+1OQlrkibY SDDJnHNZc1K3mEDMvWtVggofn2MivVAOdutZH3X6E7H+/QAdmhcDWBiJyPiT//HQSfUC V+2JsTVlZM5fOAn+1ZWRgo95n4gaQdGeB0ooY= MIME-Version: 1.0 Received: by 10.42.146.196 with SMTP id k4mr5923646icv.105.1300139182023; Mon, 14 Mar 2011 14:46:22 -0700 (PDT) Received: by 10.42.98.12 with HTTP; Mon, 14 Mar 2011 14:46:22 -0700 (PDT) In-Reply-To: <20110312172533.C671F23889ED@eris.apache.org> References: <20110312172533.C671F23889ED@eris.apache.org> Date: Mon, 14 Mar 2011 14:46:22 -0700 Message-ID: Subject: Re: svn commit: r1080950 - in /couchdb/trunk/src/couchdb: couch_replication_manager.erl couch_replicator.erl From: Randall Leeds To: dev@couchdb.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Is this very useful? We already have retry logic in couch_api_wrap_httpc generically for all #httpdb operations. On Sat, Mar 12, 2011 at 09:25, wrote: > > Author: fdmanana > Date: Sat Mar 12 17:25:33 2011 > New Revision: 1080950 > > URL: http://svn.apache.org/viewvc?rev=3D1080950&view=3Drev > Log: > Replication manager: restart replications that end up in an error state > > Closes COUCHDB-1085 > > Modified: > =C2=A0 =C2=A0couchdb/trunk/src/couchdb/couch_replication_manager.erl > =C2=A0 =C2=A0couchdb/trunk/src/couchdb/couch_replicator.erl > > Modified: couchdb/trunk/src/couchdb/couch_replication_manager.erl > URL: http://svn.apache.org/viewvc/couchdb/trunk/src/couchdb/couch_replica= tion_manager.erl?rev=3D1080950&r1=3D1080949&r2=3D1080950&view=3Ddiff > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > --- couchdb/trunk/src/couchdb/couch_replication_manager.erl (original) > +++ couchdb/trunk/src/couchdb/couch_replication_manager.erl Sat Mar 12 17= :25:33 2011 > @@ -27,6 +27,7 @@ > =C2=A0-record(rep_state, { > =C2=A0 =C2=A0 rep, > =C2=A0 =C2=A0 starting, > + =C2=A0 =C2=A0retries_left, > =C2=A0 =C2=A0 max_retries > =C2=A0}). > > @@ -113,14 +114,36 @@ handle_call({rep_complete, RepId}, _From > =C2=A0 =C2=A0 {reply, ok, State}; > > =C2=A0handle_call({rep_error, RepId, Error}, _From, State) -> > - =C2=A0 =C2=A0#rep_state{rep =3D #rep{doc_id =3D DocId}} =3D rep_state(R= epId), > - =C2=A0 =C2=A0couch_replicator:cancel_replication(RepId), > - =C2=A0 =C2=A0true =3D ets:delete(?REP_TO_STATE, RepId), > - =C2=A0 =C2=A0true =3D ets:delete(?DOC_TO_REP, DocId), > - =C2=A0 =C2=A0?LOG_ERROR("Error in replication `~s` (triggered by docume= nt `~s`): ~s", > - =C2=A0 =C2=A0 =C2=A0 =C2=A0[pp_rep_id(RepId), DocId, to_binary(error_re= ason(Error))]), > - =C2=A0 =C2=A0update_rep_doc(DocId, [{<<"_replication_state">>, <<"error= ">>}]), > - =C2=A0 =C2=A0 {reply, ok, State}; > + =C2=A0 =C2=A0#rep_state{ > + =C2=A0 =C2=A0 =C2=A0 =C2=A0rep =3D #rep{doc_id =3D DocId} =3D Rep, > + =C2=A0 =C2=A0 =C2=A0 =C2=A0retries_left =3D RetriesLeft, > + =C2=A0 =C2=A0 =C2=A0 =C2=A0max_retries =3D MaxRetries > + =C2=A0 =C2=A0} =3D RepState =3D rep_state(RepId), > + =C2=A0 =C2=A0NewState =3D case RetriesLeft > 0 of > + =C2=A0 =C2=A0false -> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0couch_replicator:cancel_replication(RepId), > + =C2=A0 =C2=A0 =C2=A0 =C2=A0true =3D ets:delete(?REP_TO_STATE, RepId), > + =C2=A0 =C2=A0 =C2=A0 =C2=A0true =3D ets:delete(?DOC_TO_REP, DocId), > + =C2=A0 =C2=A0 =C2=A0 =C2=A0?LOG_ERROR("Error in replication `~s` (trigg= ered by document `~s`): ~s" > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"~nReached maximum retry attem= pts (~p).", > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0[pp_rep_id(RepId), DocId, > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0to_binary(error_= reason(Error)), MaxRetries]), > + =C2=A0 =C2=A0 =C2=A0 =C2=A0State; > + =C2=A0 =C2=A0true -> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0NewRepState =3D RepState#rep_state{retries_l= eft =3D RetriesLeft - 1}, > + =C2=A0 =C2=A0 =C2=A0 =C2=A0true =3D ets:insert(?REP_TO_STATE, {RepId, N= ewRepState}), > + =C2=A0 =C2=A0 =C2=A0 =C2=A0Wait =3D wait_period(NewRepState), > + =C2=A0 =C2=A0 =C2=A0 =C2=A0?LOG_ERROR("Error in replication `~s` (trigg= ered by document `~s`): ~s" > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"~nRestarting replication in ~= p seconds.", > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0[pp_rep_id(RepId), DocId, > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0to_binary(error_= reason(Error)), Wait]), > + =C2=A0 =C2=A0 =C2=A0 =C2=A0Server =3D self(), > + =C2=A0 =C2=A0 =C2=A0 =C2=A0Pid =3D spawn_link(fun() -> start_replicatio= n(Server, Rep, Wait) end), > + =C2=A0 =C2=A0 =C2=A0 =C2=A0State#state{rep_start_pids =3D [Pid | State#= state.rep_start_pids]} > + =C2=A0 =C2=A0end, > + =C2=A0 =C2=A0% TODO: maybe add error reason to replication document > + =C2=A0 =C2=A0 update_rep_doc(DocId, [{<<"_replication_state">>, <<"erro= r">>}]), > + =C2=A0 =C2=A0{reply, ok, NewState}; > > =C2=A0handle_call(Msg, From, State) -> > =C2=A0 =C2=A0 ?LOG_ERROR("Replication manager received unexpected call ~p= from ~p", > @@ -332,6 +355,7 @@ maybe_start_replication(State, DocId, Re > =C2=A0 =C2=A0 =C2=A0 =C2=A0 RepState =3D #rep_state{ > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 rep =3D Rep, > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 starting =3D true, > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0retries_left =3D State#state.m= ax_retries, > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 max_retries =3D State#state.max= _retries > =C2=A0 =C2=A0 =C2=A0 =C2=A0 }, > =C2=A0 =C2=A0 =C2=A0 =C2=A0 true =3D ets:insert(?REP_TO_STATE, {RepId, Re= pState}), > @@ -339,9 +363,7 @@ maybe_start_replication(State, DocId, Re > =C2=A0 =C2=A0 =C2=A0 =C2=A0 ?LOG_INFO("Attempting to start replication `~= s` (document `~s`).", > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [pp_rep_id(RepId), DocId]), > =C2=A0 =C2=A0 =C2=A0 =C2=A0 Server =3D self(), > - =C2=A0 =C2=A0 =C2=A0 =C2=A0Pid =3D spawn_link(fun() -> > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 start_replication(Server, Rep, State= #state.max_retries) > - =C2=A0 =C2=A0 =C2=A0 =C2=A0end), > + =C2=A0 =C2=A0 =C2=A0 =C2=A0Pid =3D spawn_link(fun() -> start_replicatio= n(Server, Rep, 0) end), > =C2=A0 =C2=A0 =C2=A0 =C2=A0 State#state{rep_start_pids =3D [Pid | State#s= tate.rep_start_pids]}; > =C2=A0 =C2=A0 #rep_state{rep =3D #rep{doc_id =3D DocId}} -> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 State; > @@ -367,32 +389,15 @@ maybe_tag_rep_doc(DocId, {RepProps}, Rep > =C2=A0 =C2=A0 end. > > > -start_replication(Server, #rep{id =3D RepId, doc_id =3D DocId} =3D Rep, = MaxRetries) -> > +start_replication(Server, #rep{id =3D RepId, doc_id =3D DocId} =3D Rep, = Wait) -> > + =C2=A0 =C2=A0ok =3D timer:sleep(Wait * 1000), > =C2=A0 =C2=A0 case (catch couch_replicator:async_replicate(Rep)) of > =C2=A0 =C2=A0 {ok, _} -> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 ok =3D gen_server:call(Server, {rep_started, = RepId}, infinity), > =C2=A0 =C2=A0 =C2=A0 =C2=A0 ?LOG_INFO("Document `~s` triggered replicatio= n `~s`", > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [DocId, pp_rep_id(RepId)]); > =C2=A0 =C2=A0 Error -> > - =C2=A0 =C2=A0 =C2=A0 =C2=A0keep_retrying(Server, Rep, Error, ?INITIAL_W= AIT, MaxRetries) > - =C2=A0 =C2=A0end. > - > - > -keep_retrying(Server, Rep, Error, _Wait, 0) -> > - =C2=A0 =C2=A0ok =3D gen_server:call(Server, {rep_start_failure, Rep, Er= ror}, infinity); > - > -keep_retrying(Server, #rep{doc_id =3D DocId} =3D Rep, Error, Wait, Retri= esLeft) -> > - =C2=A0 =C2=A0?LOG_ERROR("Error starting replication `~s` (document `~s`= ): ~p. " > - =C2=A0 =C2=A0 =C2=A0 =C2=A0"Retrying in ~p seconds", [pp_rep_id(Rep), D= ocId, Error, Wait]), > - =C2=A0 =C2=A0ok =3D timer:sleep(Wait * 1000), > - =C2=A0 =C2=A0case (catch couch_replicator:async_replicate(Rep)) of > - =C2=A0 =C2=A0{ok, _} -> > - =C2=A0 =C2=A0 =C2=A0 =C2=A0ok =3D gen_server:call(Server, {rep_started,= Rep#rep.id}, infinity), > - =C2=A0 =C2=A0 =C2=A0 =C2=A0#rep_state{max_retries =3D MaxRetries} =3D r= ep_state(Rep#rep.id), > - =C2=A0 =C2=A0 =C2=A0 =C2=A0?LOG_INFO("Document `~s` triggered replicati= on `~s` after ~p attempts", > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0[DocId, pp_rep_id(Rep), MaxRet= ries - RetriesLeft + 1]); > - =C2=A0 =C2=A0NewError -> > - =C2=A0 =C2=A0 =C2=A0 =C2=A0keep_retrying(Server, Rep, NewError, Wait * = 2, RetriesLeft - 1) > + =C2=A0 =C2=A0 =C2=A0 =C2=A0ok =3D gen_server:call(Server, {rep_error, R= epId, Error}, infinity) > =C2=A0 =C2=A0 end. > > > @@ -502,3 +507,12 @@ error_reason({error, Reason}) -> > =C2=A0 =C2=A0 Reason; > =C2=A0error_reason(Reason) -> > =C2=A0 =C2=A0 Reason. > + > + > +wait_period(#rep_state{max_retries =3D Max, retries_left =3D Left}) -> > + =C2=A0 =C2=A0wait_period(Max - Left, ?INITIAL_WAIT). > + > +wait_period(1, T) -> > + =C2=A0 =C2=A0T; > +wait_period(N, T) when N > 1 -> > + =C2=A0 =C2=A0wait_period(N - 1, 2 * T). > > Modified: couchdb/trunk/src/couchdb/couch_replicator.erl > URL: http://svn.apache.org/viewvc/couchdb/trunk/src/couchdb/couch_replica= tor.erl?rev=3D1080950&r1=3D1080949&r2=3D1080950&view=3Ddiff > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > --- couchdb/trunk/src/couchdb/couch_replicator.erl (original) > +++ couchdb/trunk/src/couchdb/couch_replicator.erl Sat Mar 12 17:25:33 20= 11 > @@ -105,7 +105,7 @@ async_replicate(#rep{id =3D {BaseId, Ext}, > =C2=A0 =C2=A0 ChildSpec =3D { > =C2=A0 =C2=A0 =C2=A0 =C2=A0 RepChildId, > =C2=A0 =C2=A0 =C2=A0 =C2=A0 {gen_server, start_link, [?MODULE, Rep, []]}, > - =C2=A0 =C2=A0 =C2=A0 =C2=A0transient, > + =C2=A0 =C2=A0 =C2=A0 =C2=A0temporary, > =C2=A0 =C2=A0 =C2=A0 =C2=A0 1, > =C2=A0 =C2=A0 =C2=A0 =C2=A0 worker, > =C2=A0 =C2=A0 =C2=A0 =C2=A0 [?MODULE] > >