couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Marca <jma...@translab.its.uci.edu>
Subject Re: couchdb crashes silently
Date Wed, 18 Sep 2013 17:10:05 GMT
Dear list,

For future reference, I think my problem is solved, and it doesn't
appear to be a CouchDB or Erlang thing, but rather a library/Gentoo
Linux issue.

This is a Gentoo Linux box, and Gentoo likes to be rebuilt from top to
bottom every 6 months or so, I bit the bullet and did that.  In the
process I noticed here and there messages about links to icu library
within couchdb that required a rebuild of couchdb.  So, wildly
guessing, I *think* that was the problem...an older build of icu was
being used during the couchdb build, but was incompatible with some
other, more recently built system library.

Or perhaps it was something else.  Regardless, a rebuild of everything
solved the problems I was having. Been stable for a few hours now with
about twice the load that was crashing it before.

Thanks,

James Marca

On Mon, Sep 16, 2013 at 08:28:09PM +0200, Dave Cottlehuber wrote:
> My gut feel is that some OS thing is killing off beam and the usual
> suspect for that is OOM. I see you've noted nothing wrt in logs
> though.
> 
> On ubuntu > 12.x this works:
> 
> ps -ef |grep beam
> # you'll see 2 processes, so do this for both pids
> cat /proc/$PID/oom_score
> 124
> # echo '-1000' > /proc/$PID/oom_score_adj
> # cat /proc/$PID/oom_score
> 
> 
> only other advice I can offer is to login & run as sudo <couchdb_user>
> `couchdb -i` for a while, it's interactive mode and *maybe* something
> useful will be left…
> 
> 
> 
> On 16 September 2013 18:59, James Marca <jmarca@translab.its.uci.edu> wrote:
> > On Sun, Sep 15, 2013 at 10:10:24PM -0700, James Marca wrote:
> >> On Sun, Sep 15, 2013 at 08:04:27PM +0200, Dave Cottlehuber wrote:
> >> > NIF scheduler issues could be a reasonable suspect;
> >> >
> >> >  heart: Fri Sep 13 20:59:36 2013: heart-beat time-out, no activity for
> >> > 15 seconds
> >> >
> >> > 15 seconds is a *long* time however.
> >> >
> >> > 1.4.0 needs 14B04 or higher I think due to one of our dependencies, so
> >> > I'd suggest reverting back to that & seeing if you are having any
> >> > other issues.
> >> >
> >> > Also, probably unrelated, why is kernel polling disabled?
> >>
> >> Honestly, on my gentoo boxes I just use the ebuild.  I have no idea
> >> why kernel polling is false...it is whatever the default is in the
> >> ebuild I guess.  I have no clue about whether kpoll should be enabled,
> >> so I'm trusting the default.
> >
> > correction.  kernel polling is enabled.  The kpoll option is set when
> > building, and /usr/bin/couchdb has +K true.  If I invoke erl with +K true, then
> > kpoll=true.  One think I do not havae though is HIPE enabled.
> >
> > --
> > This message has been scanned for viruses and
> > dangerous content by MailScanner, and is
> > believed to be clean.
> >


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Mime
View raw message