couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stephen bartell <snbart...@gmail.com>
Subject Re: random couch crash
Date Tue, 07 Aug 2012 17:27:26 GMT

> Hi Stephen,
> 
> Can you tell us anymore about the context, or did you start seeing these in the logs?

Sure, here's some context.  This couch is part of a demo server.  It travels a lot and is
cycled a lot.  There is one physical server, it consists of nginx (serving web apps and reverse
proxying for couch), couchdb for persistence, and numerous programs which read and write to
couch.  Traffic on couch can get very heavy.

I didn't first see this in the logs.  Some of the web apps would grind to a halt, nginx would
return 404, and then eventually couch would restart.  This would happen every couple of minutes.


> By chance do you have a scenario that reproduces this? Was this db compacted or replicated
from elsewhere?

I wish I had a pliable scenario other than sending the server through taxi cabs, airlines,
and pulling the power cord several times a day.  We haven't seen this on any of our production
servers.
This server was not subject to any replication.  Most databases on it are compacted often.
 

Last night we were able to drill down to one particular program which was triggering the crash.
 One by one, we backed up, deleted, and rebuilt the databases that program touched.  There
was one database which seemed to be the culprit, lets call it History.  History is a dumping
ground for stale docs from another db. History is almost always written to, and rarely read
from.   We don't compact History since all docs in it are one revision deep.  We never replicate
to or from it.  The only reason we deem History the culprit is because after rebuilding it,
there hasn't been a crash for over 12 hours.

I have an additional question.  Is it possible to turn couch logging off entirely, or would
redirecting to dev/null suffice?  When couch would crash, hundreds of MB of crap would get
dumped to the log. ( {{badmatch,{ok,<<32,50,48,48,10 … 'hundreds of MB of crap' …
,0,3,232>>}}).  Right when this dump occurred, the cpu spiked and the server began its
downward descent. 

Best

> 
> Thanks,
> 
> Bob
> On Aug 7, 2012, at 2:06 AM, stephen bartell <snbartell@gmail.com> wrote:
> 
>> Hi all, could some one help shed some light on this crash I'm having.  I'm on v1.2,
ubuntu 11.04.  
>> 
>> [Mon, 06 Aug 2012 18:29:16 GMT] [error] [<0.492.0>] ** Generic server <0.492.0>
terminating 
>> ** Last message in was {pread_iolist,88385709}
>> ** When Server state == {file,{file_descriptor,prim_file,{#Port<0.2899>,79}},
>>                             93302896}
>> ** Reason for termination == 
>> ** {{badmatch,{ok,<<32,50,48,48,10 … huge dump … ,0,3,232>>}},
>>   [{couch_file,read_raw_iolist_int,3},
>>    {couch_file,maybe_read_more_iolist,4},
>>    {couch_file,handle_call,3},
>>    {gen_server,handle_msg,5},
>>    {proc_lib,init_p_do_apply,3}]}
>> 
>> I'm not too familiar with erlang, but what I gathered from the src was `pread_iolist`
function is used when reading anything from the disk.  So I think this might be a corrupt
db problem.
>> 
>> Thanks,
>> Stephen Bartell
> 


Mime
View raw message