incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Smith <...@apache.org>
Subject Re: Respawning server died, can't figure out why
Date Wed, 14 Aug 2013 01:02:45 GMT
Hi, Nathan. Is the couch under heavy load? Thanks.


On Wed, Aug 14, 2013 at 6:15 AM, Joan Touzet <wohali@apache.org> wrote:

> On Tue, Aug 13, 2013 at 02:49:28PM -0500, Nathan Vander Wilt wrote:
> > I've got 1.7GB disk free and 2GB of memory available at the moment, so
> it doesn't seem to be either of those. (I could not find any out-of-memory
> process kill logs in /var/log/syslog.) The only clue I can find is in
> couchdb.stderr:
> >     heart_beat_kill_pid = 1390
> >     heart_beat_timeout = 11
> >     heart: Tue Aug 13 18:34:21 2013: heart-beat time-out, no activity
> for 15 seconds
> >     Killed
>
> So 15s of system clock time passed without erlang's heart receiving a
> ping back. There's a number of possibilities; for instance, if this is a
> VM and the clock was advanced/changed by 15s to synchronize with the
> main system, heart might see that and issue a kill command. Another
> could be extremely heavy load on the system forcing the second couch
> process to get swapped out.
>
> Three suggestions:
>
>   1. set RESPAWN_TIMEOUT to a non-zero value to force couch to restart
>      after a kill. Because of its crash-only design this is safe, and
>      since restarts are rare you're liable to not really be running
>      into serious issues.
>   2. Crank up logging to debug level to see what might be going on
>      when the heartbeat fails to respond.
>   3. Add some additional system monitoring to ensure that you're not
>      overloading your system on CPU, RAM, I/O or network traffic.
>      Do you have a lot of views building / heavy system load due to
>      couchjs processes?
>
> --
> Joan Touzet | joant@atypical.net | wohali everywhere else
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message