Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 7489 invoked from network); 5 Oct 2009 19:13:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Oct 2009 19:13:57 -0000 Received: (qmail 3195 invoked by uid 500); 5 Oct 2009 19:13:56 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 3124 invoked by uid 500); 5 Oct 2009 19:13:56 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 3114 invoked by uid 99); 5 Oct 2009 19:13:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Oct 2009 19:13:56 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of zachary.zolton@gmail.com designates 209.85.219.208 as permitted sender) Received: from [209.85.219.208] (HELO mail-ew0-f208.google.com) (209.85.219.208) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Oct 2009 19:13:46 +0000 Received: by ewy4 with SMTP id 4so3142661ewy.7 for ; Mon, 05 Oct 2009 12:12:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=fuChqorpn/Yw+4jUl2e+QIyRTJk/kG4TLlsKOtK9LVg=; b=u3OA20j/psTz0M5zVdnT40wPLJERDYUc6Y8OHOxjEc/exofiosC16Bz9iYA7eAKEzn pcRT4fpB2pZ4tsvroY5LxemNpd9iMiBy/zMuqFMnVUZddoB7bKgA5Wq3pKyDt7cv/Xcs MjPd4am+dUxkGRtAyOsyOm+M6iQjC4IXNGWp0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=p8/oi+Pxl/IBi/H0/1t0HTaJWslK4Yu/T6q4JevPCWzvJ3nmqFkns9ssKjnNjlv/nU FPfUcNvGiXMTbRz4t1xMxaSQrxOAQgaderWIZkZ1A2v/CGRME/N9UfMbp3xgnjpNRI4G mz39yg+ehsEDzE1WdPa8UR0a2UTIIcuKK5f6g= MIME-Version: 1.0 Received: by 10.216.53.205 with SMTP id g55mr85814wec.160.1254769946207; Mon, 05 Oct 2009 12:12:26 -0700 (PDT) In-Reply-To: <55047b710910051156i4efcb060o7d9f84a05576355f@mail.gmail.com> References: <5E259690-AAE4-46AB-8346-3ACD10921500@freshout.us> <55047b710910050045rda81e37qb214558b22168645@mail.gmail.com> <293FA6BA-A2F9-439F-8F5C-96A52E640D57@freshout.us> <55047b710910051005h7192d0ber46c0b4136142041f@mail.gmail.com> <46aeb24f0910051022g40ab18b3y64be22dd8818066f@mail.gmail.com> <46aeb24f0910051048g3c3f273aod1cd6bd838a0d644@mail.gmail.com> <55047b710910051156i4efcb060o7d9f84a05576355f@mail.gmail.com> From: Zachary Zolton Date: Mon, 5 Oct 2009 14:12:06 -0500 Message-ID: Subject: Re: couchdb and monit To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Yeah, we've been using monit quite happily, as well: check process couchdb with pidfile /usr/local/var/run/couchdb/couchdb.pid group database start program =3D "/etc/init.d/couchdb start" stop program =3D "/etc/init.d/couchdb stop" if failed host 127.0.0.1 port 5984 then restart if cpu is greater than 40% for 2 cycles then alert if cpu > 60% for 5 cycles then restart if 10 restarts within 10 cycles then timeout depends on data_fs You'll note that it depends on data_fs which is an Amazon EBS drive that also is monitored. Furthermore, we can be notified of high CPU usage for traffic spikes... On Mon, Oct 5, 2009 at 1:56 PM, Nicholas Orr wrote= : > I've changed mine to do the -r 5and to send an alert if it is not running= . > as long as -r 5 does what it is suppose to do everything will be ok > if it fails at least I'll know about is - this is where monit is useful, = no > matter how smart/capable an erlang app is "suppose" to be, I'd like to kn= ow > if it goes down :) > > Nick > > On Tue, Oct 6, 2009 at 4:48 AM, Robert Newson wr= ote: > >> Understood. All I'm saying is that Erlang applications should already >> have rich support for process restarting, heartbeat/keep-alive. >> >> monit is a generic wrapper to add those things when they are absent. A >> correctly configured Erlang application shouldn't need monit, imo. >> >> B. >> >> On Mon, Oct 5, 2009 at 6:40 PM, Francisco Viramontes >> wrote: >> > I dunno but I tried with the respawn parameter for couchdb command in >> Gentoo >> > but it did not work. Also I have other services setup with monit so it= s >> more >> > convenient for me to have everything in one place. >> > >> > PAco >> > On Oct 5, 2009, at 12:22 PM, Robert Newson wrote: >> > >> >> Isn't couchdb (at least in the Debian package) monitored by heart? >> >> >> >> B. >> >> >> >> On Mon, Oct 5, 2009 at 6:05 PM, Nicholas Orr >> >> wrote: >> >>> >> >>> great! >> >>> i was wondering what to put for the "test" conditions. >> >>> Yours work well, so thanks to you as well ;) >> >>> >> >>> Nick >> >>> >> >>> On Tue, Oct 6, 2009 at 4:01 AM, Francisco Viramontes >> >>> wrote: >> >>> >> >>>> Nicholas >> >>>> >> >>>> Thanks man it worked I had been banging on my head for a week becau= se >> of >> >>>> this >> >>>> >> >>>> my final monit scipt is >> >>>> >> >>>> check process couchdb >> >>>> =A0with pidfile /var/run/couchdb/couchdb.pid >> >>>> =A0#start program =3D "/etc/init.d/couchdb start" >> >>>> =A0#stop program =3D "/etc/init.d/couchdb stop" >> >>>> =A0start program =3D "/usr/bin/sudo -u couchdb /usr/bin/couchdb -b = -o >> >>>> /dev/null >> >>>> -e /dev/null -p /var/run/couchdb/couchdb.pid" >> >>>> =A0stop program =A0=3D "/usr/bin/sudo -u couchdb /usr/bin/couchdb -= b -o >> >>>> /dev/null >> >>>> -e /dev/null -p /var/run/couchdb/couchdb.pid -d" >> >>>> =A0if failed host 127.0.0.1 port 5984 then restart >> >>>> =A0if failed url http://localhost:5984/ and content =3D=3D '"couchd= b"' then >> >>>> restart >> >>>> =A0group couchdb >> >>>> >> >>>> PAco >> >>>> >> >>>> >> >>>> On Oct 5, 2009, at 2:45 AM, Nicholas Orr wrote: >> >>>> >> >>>> =A0My monit script is verbatim, as monit is run as root I want couc= hdb >> >>>>> >> >>>>> run as couchdb so do the following >> >>>>> >> >>>>> check process couchdb with pidfile /var/run/couchdb/couchdb.pid >> >>>>> =A0start program =3D "/usr/bin/sudo -u couchdb /usr/bin/couchdb -b= -o >> >>>>> /dev/null -e /dev/null -p /var/run/couchdb/couchdb.pid" >> >>>>> =A0stop program =A0=3D "/usr/bin/sudo -u couchdb /usr/bin/couchdb = -b -o >> >>>>> /dev/null -e /dev/null -p /var/run/couchdb/couchdb.pid -d" >> >>>>> >> >>>>> try that and see what happens... >> >>>>> >> >>>>> On Mon, Oct 5, 2009 at 7:49 AM, Francisco Viramontes < >> paco@freshout.us> >> >>>>> wrote: >> >>>>> >> >>>>>> Hey Guys >> >>>>>> >> >>>>>> has anyone tried to monitor couch with monit? >> >>>>>> >> >>>>>> I am using this settings and monit successfully monitors but when >> >>>>>> couchdb >> >>>>>> dies it fails to restart the service and I can find out why >> >>>>>> >> >>>>>> here is my couchdb.monitrc file: >> >>>>>> >> >>>>>> check process couchdb >> >>>>>> =A0with pidfile /var/run/couchdb/couchdb.pid >> >>>>>> =A0start program =3D "/etc/init.d/couchdb start" >> >>>>>> =A0stop program =3D "/etc/init.d/couchdb stop" >> >>>>>> =A0if failed host 127.0.0.1 port 5984 then restart >> >>>>>> =A0if failed url http://localhost:5984/ and content =3D=3D '"couc= hdb"' >> then >> >>>>>> restart >> >>>>>> =A0group couchdb >> >>>>>> >> >>>>>> BTW I am using couch 0.9.1 and about once a day it dies on me the >> only >> >>>>>> thing >> >>>>>> I get from the log are strange erlang error messages saying OS >> procees >> >>>>>> timeout, anyone know whats that about? >> >>>>>> >> >>>>>> PAco >> >>>>>> >> >>>>>> >> >>>> >> >>> >> > >> > >> >