couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benoit Chesneau <bchesn...@gmail.com>
Subject Re: couchdb crashes silently
Date Thu, 19 Sep 2013 05:17:27 GMT
On Wednesday, September 18, 2013, James Marca wrote:

> Dear list,
>
> For future reference, I think my problem is solved, and it doesn't
> appear to be a CouchDB or Erlang thing, but rather a library/Gentoo
> Linux issue.
>
> This is a Gentoo Linux box, and Gentoo likes to be rebuilt from top to
> bottom every 6 months or so, I bit the bullet and did that.  In the
> process I noticed here and there messages about links to icu library
> within couchdb that required a rebuild of couchdb.  So, wildly
> guessing, I *think* that was the problem...an older build of icu was
> being used during the couchdb build, but was incompatible with some
> other, more recently built system library.
>
> Or perhaps it was something else.  Regardless, a rebuild of everything
> solved the problems I was having. Been stable for a few hours now with
> about twice the load that was crashing it before.
>
> Thanks,
>
> James Marca



The odd thing is it was looking in the lost+found folder . Like your files
have been delted or smth happened during a restart and/or fs checking. Imo
you could find such events in the system logs.

It would also explain wby a rebuild fixed the things.

- benoit

>
> On Mon, Sep 16, 2013 at 08:28:09PM +0200, Dave Cottlehuber wrote:
> > My gut feel is that some OS thing is killing off beam and the usual
> > suspect for that is OOM. I see you've noted nothing wrt in logs
> > though.
> >
> > On ubuntu > 12.x this works:
> >
> > ps -ef |grep beam
> > # you'll see 2 processes, so do this for both pids
> > cat /proc/$PID/oom_score
> > 124
> > # echo '-1000' > /proc/$PID/oom_score_adj
> > # cat /proc/$PID/oom_score
> >
> >
> > only other advice I can offer is to login & run as sudo <couchdb_user>
> > `couchdb -i` for a while, it's interactive mode and *maybe* something
> > useful will be left…
> >
> >
> >
> > On 16 September 2013 18:59, James Marca <jmarca@translab.its.uci.edu<javascript:;>>
> wrote:
> > > On Sun, Sep 15, 2013 at 10:10:24PM -0700, James Marca wrote:
> > >> On Sun, Sep 15, 2013 at 08:04:27PM +0200, Dave Cottlehuber wrote:
> > >> > NIF scheduler issues could be a reasonable suspect;
> > >> >
> > >> >  heart: Fri Sep 13 20:59:36 2013: heart-beat time-out, no activity
> for
> > >> > 15 seconds
> > >> >
> > >> > 15 seconds is a *long* time however.
> > >> >
> > >> > 1.4.0 needs 14B04 or higher I think due to one of our dependencies,
> so
> > >> > I'd suggest reverting back to that & seeing if you are having
any
> > >> > other issues.
> > >> >
> > >> > Also, probably unrelated, why is kernel polling disabled?
> > >>
> > >> Honestly, on my gentoo boxes I just use the ebuild.  I have no idea
> > >> why kernel polling is false...it is whatever the default is in the
> > >> ebuild I guess.  I have no clue about whether kpoll should be enabled,
> > >> so I'm trusting the default.
> > >
> > > correction.  kernel polling is enabled.  The kpoll option is set when
> > > building, and /usr/bin/couchdb has +K true.  If I invoke erl with +K
> true, then
> > > kpoll=true.  One think I do not havae though is HIPE enabled.
> > >
> > > --
> > > This message has been scanned for viruses and
> > > dangerous content by MailScanner, and is
> > > believed to be clean.
> > >
>
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message