Return-Path: X-Original-To: apmail-couchdb-commits-archive@www.apache.org Delivered-To: apmail-couchdb-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6FCBD112BB for ; Fri, 1 Aug 2014 09:09:59 +0000 (UTC) Received: (qmail 13134 invoked by uid 500); 1 Aug 2014 09:09:58 -0000 Delivered-To: apmail-couchdb-commits-archive@couchdb.apache.org Received: (qmail 12970 invoked by uid 500); 1 Aug 2014 09:09:58 -0000 Mailing-List: contact commits-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list commits@couchdb.apache.org Received: (qmail 12057 invoked by uid 99); 1 Aug 2014 09:09:58 -0000 Received: from tyr.zones.apache.org (HELO tyr.zones.apache.org) (140.211.11.114) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Aug 2014 09:09:58 +0000 Received: by tyr.zones.apache.org (Postfix, from userid 65534) id DDCA39BCB80; Fri, 1 Aug 2014 09:09:57 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: rnewson@apache.org To: commits@couchdb.apache.org Date: Fri, 01 Aug 2014 09:10:28 -0000 Message-Id: In-Reply-To: <7ec92bc8976347e8be5a76bf4eac0b9b@git.apache.org> References: <7ec92bc8976347e8be5a76bf4eac0b9b@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [32/35] git commit: Rearrange and lengthen the watchdog delay Rearrange and lengthen the watchdog delay I did not completely comprehend that code upgrades are not atomic for all code. This watchdog ended up causing a node reboot into an unusable state because it killed couch_db_update_notifier handlers before the new code was installed for each app. This lead to mem3 quickly cycling trying to use couch_db_update_notifier which eventually took down the mem3 app which took down the node. Then the node would reboot into 1202 after databases had upgraded their headers which prevented the node from booting correctly. By extending the timeout to five minutes and placing it before the first call to terminating couch_db_update I hope to give the release enough time to complete before telling each handler to upgrade. Project: http://git-wip-us.apache.org/repos/asf/couchdb-couch-event/repo Commit: http://git-wip-us.apache.org/repos/asf/couchdb-couch-event/commit/707997e3 Tree: http://git-wip-us.apache.org/repos/asf/couchdb-couch-event/tree/707997e3 Diff: http://git-wip-us.apache.org/repos/asf/couchdb-couch-event/diff/707997e3 Branch: refs/heads/windsor-merge Commit: 707997e37db11aa8194b00c0a432e49c7071b1f2 Parents: de23171 Author: Paul J. Davis Authored: Tue Jun 18 12:14:17 2013 -0500 Committer: Robert Newson Committed: Wed Jul 30 17:49:19 2014 +0100 ---------------------------------------------------------------------- src/couch_event_server.erl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/couchdb-couch-event/blob/707997e3/src/couch_event_server.erl ---------------------------------------------------------------------- diff --git a/src/couch_event_server.erl b/src/couch_event_server.erl index bd291aa..1c7bcf4 100644 --- a/src/couch_event_server.erl +++ b/src/couch_event_server.erl @@ -123,6 +123,7 @@ code_change(_OldVsn, St, _Extra) -> watchdog() -> + timer:sleep(300000), Handlers = gen_event:which_handlers(couch_db_update), case length(Handlers) > 0 of true -> @@ -133,7 +134,6 @@ watchdog() -> false -> ok end, - timer:sleep(5000), ?MODULE:watchdog().