Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6A14E9693 for ; Tue, 7 Aug 2012 17:39:30 +0000 (UTC) Received: (qmail 82663 invoked by uid 500); 7 Aug 2012 17:39:28 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 82640 invoked by uid 500); 7 Aug 2012 17:39:28 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 82628 invoked by uid 99); 7 Aug 2012 17:39:28 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Aug 2012 17:39:28 +0000 Received: from localhost (HELO [192.168.1.5]) (127.0.0.1) (smtp-auth username rnewson, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Aug 2012 17:39:27 +0000 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Apple Message framework v1278) Subject: Re: random couch crash From: Robert Newson In-Reply-To: <7796036E-36F8-4DE4-921D-182945B065EC@gmail.com> Date: Tue, 7 Aug 2012 18:39:24 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <411049A3-8676-402E-9B9E-E132EE2ADEA6@apache.org> References: <27371E5A-201D-4017-9E3B-4F96093748B0@dionne-associates.com> <7796036E-36F8-4DE4-921D-182945B065EC@gmail.com> To: user@couchdb.apache.org X-Mailer: Apple Mail (2.1278) Are you running with delayed_commits=3Dtrue or false? B. On 7 Aug 2012, at 18:27, stephen bartell wrote: >=20 >> Hi Stephen, >>=20 >> Can you tell us anymore about the context, or did you start seeing = these in the logs? >=20 > Sure, here's some context. This couch is part of a demo server. It = travels a lot and is cycled a lot. There is one physical server, it = consists of nginx (serving web apps and reverse proxying for couch), = couchdb for persistence, and numerous programs which read and write to = couch. Traffic on couch can get very heavy. >=20 > I didn't first see this in the logs. Some of the web apps would grind = to a halt, nginx would return 404, and then eventually couch would = restart. This would happen every couple of minutes.=20 >=20 >> By chance do you have a scenario that reproduces this? Was this db = compacted or replicated from elsewhere? >=20 > I wish I had a pliable scenario other than sending the server through = taxi cabs, airlines, and pulling the power cord several times a day. We = haven't seen this on any of our production servers. > This server was not subject to any replication. Most databases on it = are compacted often. =20 >=20 > Last night we were able to drill down to one particular program which = was triggering the crash. One by one, we backed up, deleted, and = rebuilt the databases that program touched. There was one database = which seemed to be the culprit, lets call it History. History is a = dumping ground for stale docs from another db. History is almost always = written to, and rarely read from. We don't compact History since all = docs in it are one revision deep. We never replicate to or from it. = The only reason we deem History the culprit is because after rebuilding = it, there hasn't been a crash for over 12 hours. >=20 > I have an additional question. Is it possible to turn couch logging = off entirely, or would redirecting to dev/null suffice? When couch = would crash, hundreds of MB of crap would get dumped to the log. ( = {{badmatch,{ok,<<32,50,48,48,10 =85 'hundreds of MB of crap' =85 = ,0,3,232>>}}). Right when this dump occurred, the cpu spiked and the = server began its downward descent.=20 >=20 > Best >=20 >>=20 >> Thanks, >>=20 >> Bob >> On Aug 7, 2012, at 2:06 AM, stephen bartell = wrote: >>=20 >>> Hi all, could some one help shed some light on this crash I'm = having. I'm on v1.2, ubuntu 11.04. =20 >>>=20 >>> [Mon, 06 Aug 2012 18:29:16 GMT] [error] [<0.492.0>] ** Generic = server <0.492.0> terminating=20 >>> ** Last message in was {pread_iolist,88385709} >>> ** When Server state =3D=3D = {file,{file_descriptor,prim_file,{#Port<0.2899>,79}}, >>> 93302896} >>> ** Reason for termination =3D=3D=20 >>> ** {{badmatch,{ok,<<32,50,48,48,10 =85 huge dump =85 ,0,3,232>>}}, >>> [{couch_file,read_raw_iolist_int,3}, >>> {couch_file,maybe_read_more_iolist,4}, >>> {couch_file,handle_call,3}, >>> {gen_server,handle_msg,5}, >>> {proc_lib,init_p_do_apply,3}]} >>>=20 >>> I'm not too familiar with erlang, but what I gathered from the src = was `pread_iolist` function is used when reading anything from the disk. = So I think this might be a corrupt db problem. >>>=20 >>> Thanks, >>> Stephen Bartell >>=20 >=20