incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Octavian Damiean <mainer...@gmail.com>
Subject Re: random couch crash
Date Tue, 07 Aug 2012 20:21:09 GMT
Hello Stephen,

Just "less" the log and let it wait for changes. That way you can inspect
what it does.

Cheers, Octavian

On Tue, Aug 7, 2012 at 10:18 PM, stephen bartell <snbartell@gmail.com>wrote:

> we don't even "think" it started.  After starting compact we looked at the
> status in futon and nothing came up.  The reason I say "think" is because
> compact can happen too quickly for us to click over to status and watch it
> start/end.  But for this db of this size it should have taken ~ 5-10 sec.
>  So we assumed it failed and went on to destroying/rebuilding the db.
>
>
> On Aug 7, 2012, at 1:11 PM, Robert Newson wrote:
>
> >
> > did compaction complete, though? I wasn't thinking of reducing the file
> size, but of being able to successfully read all live data and write it
> back out again.
> >
> > B.
> >
> > On 7 Aug 2012, at 21:01, stephen bartell wrote:
> >
> >> I'll consider delayed_commits.
> >>
> >> The database was 85MB before compaction. We ran compact and it was
> still 85Mb.  So compact didn't work.  The same db on other servers will
> compact ~10x its original size.
> >>
> >>
> >>
> >>
> >>> I strongly suggest disabling delayed_commits on general principles
> (what's written should stay written). Are you able to compact the
> database(s) that give this error?
> >>>
> >>> B.
> >>>
> >>> On 7 Aug 2012, at 18:42, stephen bartell wrote:
> >>>
> >>>> delayed_commits = true
> >>>>
> >>>> Stephen Bartell
> >>>>
> >>>> On Aug 7, 2012, at 10:39 AM, Robert Newson wrote:
> >>>>
> >>>>> Are you running with delayed_commits=true or false?
> >>>>>
> >>>>> B.
> >>>>>
> >>>>> On 7 Aug 2012, at 18:27, stephen bartell wrote:
> >>>>>
> >>>>>>
> >>>>>>> Hi Stephen,
> >>>>>>>
> >>>>>>> Can you tell us anymore about the context, or did you start
seeing
> these in the logs?
> >>>>>>
> >>>>>> Sure, here's some context.  This couch is part of a demo server.
>  It travels a lot and is cycled a lot.  There is one physical server, it
> consists of nginx (serving web apps and reverse proxying for couch),
> couchdb for persistence, and numerous programs which read and write to
> couch.  Traffic on couch can get very heavy.
> >>>>>>
> >>>>>> I didn't first see this in the logs.  Some of the web apps would
> grind to a halt, nginx would return 404, and then eventually couch would
> restart.  This would happen every couple of minutes.
> >>>>>>
> >>>>>>> By chance do you have a scenario that reproduces this? Was
this db
> compacted or replicated from elsewhere?
> >>>>>>
> >>>>>> I wish I had a pliable scenario other than sending the server
> through taxi cabs, airlines, and pulling the power cord several times a
> day.  We haven't seen this on any of our production servers.
> >>>>>> This server was not subject to any replication.  Most databases
on
> it are compacted often.
> >>>>>>
> >>>>>> Last night we were able to drill down to one particular program
> which was triggering the crash.  One by one, we backed up, deleted, and
> rebuilt the databases that program touched.  There was one database which
> seemed to be the culprit, lets call it History.  History is a dumping
> ground for stale docs from another db. History is almost always written to,
> and rarely read from.   We don't compact History since all docs in it are
> one revision deep.  We never replicate to or from it.  The only reason we
> deem History the culprit is because after rebuilding it, there hasn't been
> a crash for over 12 hours.
> >>>>>>
> >>>>>> I have an additional question.  Is it possible to turn couch
> logging off entirely, or would redirecting to dev/null suffice?  When couch
> would crash, hundreds of MB of crap would get dumped to the log. (
> {{badmatch,{ok,<<32,50,48,48,10 … 'hundreds of MB of crap' … ,0,3,232>>}}).
>  Right when this dump occurred, the cpu spiked and the server began its
> downward descent.
> >>>>>>
> >>>>>> Best
> >>>>>>
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> Bob
> >>>>>>> On Aug 7, 2012, at 2:06 AM, stephen bartell <snbartell@gmail.com>
> wrote:
> >>>>>>>
> >>>>>>>> Hi all, could some one help shed some light on this
crash I'm
> having.  I'm on v1.2, ubuntu 11.04.
> >>>>>>>>
> >>>>>>>> [Mon, 06 Aug 2012 18:29:16 GMT] [error] [<0.492.0>]
** Generic
> server <0.492.0> terminating
> >>>>>>>> ** Last message in was {pread_iolist,88385709}
> >>>>>>>> ** When Server state ==
> {file,{file_descriptor,prim_file,{#Port<0.2899>,79}},
> >>>>>>>>                       93302896}
> >>>>>>>> ** Reason for termination ==
> >>>>>>>> ** {{badmatch,{ok,<<32,50,48,48,10 … huge dump
… ,0,3,232>>}},
> >>>>>>>> [{couch_file,read_raw_iolist_int,3},
> >>>>>>>> {couch_file,maybe_read_more_iolist,4},
> >>>>>>>> {couch_file,handle_call,3},
> >>>>>>>> {gen_server,handle_msg,5},
> >>>>>>>> {proc_lib,init_p_do_apply,3}]}
> >>>>>>>>
> >>>>>>>> I'm not too familiar with erlang, but what I gathered
from the
> src was `pread_iolist` function is used when reading anything from the
> disk.  So I think this might be a corrupt db problem.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Stephen Bartell
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message