couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carl Humphrey <c...@youdo.co.nz>
Subject couchdb restart problem.
Date Mon, 14 Jun 2010 01:03:45 GMT
Hi All,

I am experiencing a problem with both couchdb version 0.10 and since upgrade, version 0.11

We have a process that generates lots of small png images (~12kb) and uploads them into couchdb
as document attachments. the process kicks off about 3000 of these at a specific time of the
day. takes about half an hour.

Unfortunately sometimes during this loaded period, couchdb crashes and appears to restart.
replication is then offline, and the daemon process temporarily looses connection to couch
during the restart for a few seconds.

Heres the command we are using ( based on the supplied init.d file ) in the ps list

couchdb  24095  0.0  0.0   4020   644 ?        S    Jun13   0:00 /bin/sh -e /opt/couchdb/bin/couchdb
-a /opt/couchdb/etc/couchdb/default.ini -a /opt/couchdb/etc/couchdb/local.ini -b -r 5 -p /var/couchdb/run/couchdb/couchdb.pid
-o /dev/null -e /dev/null -R
couchdb  24105  0.0  0.0   4020   356 ?        S    Jun13   0:00  \_ /bin/sh -e /opt/couchdb/bin/couchdb
-a /opt/couchdb/etc/couchdb/default.ini -a /opt/couchdb/etc/couchdb/local.ini -b -r 5 -p /var/couchdb/run/couchdb/couchdb.pid
-o /dev/null -e /dev/null -R
couchdb  24106  1.8  0.2 330228 41784 ?        Sl   Jun13  22:56      \_ /opt/erlang_R13B03/lib/erlang/erts-5.7.4/bin/beam.smp
-Bd -K true -- -root /opt/erlang_R13B03/lib/erlang -progname erl -- -home /home/couchdb --
-noshell -noinput -sasl errlog_type error -couch_ini /opt/couchdb/etc/couchdb/default.ini
/opt/couchdb/etc/couchdb/local.ini /opt/couchdb/etc/couchdb/default.ini /opt/couchdb/etc/couchdb/local.ini
-s couch -pidfile /var/couchdb/run/couchdb/couchdb.pid -heart
couchdb  24122  0.0  0.0   3784   504 ?        Ss   Jun13   0:00          \_ heart -pid 24106
-ht 11
couchdb  24127  0.0  0.0  10640   524 ?        Ss   Jun13   0:00          \_ inet_gethost
4
couchdb  24128  0.0  0.0  12736   628 ?        S    Jun13   0:00              \_ inet_gethost
4
couchdb  24638  0.0  0.0  12736   624 ?        S    Jun13   0:00              \_ inet_gethost
4

during one of these crashes we would see that process 24106 and below would restart, however
the processes above would still say Jun13 for instance

nothing in the couchdb logs that I can find.

this happens with fsync per commit both on and off.

Questions

1 - whats the best way to find out why the crash is occurring, should i be running without
the -o /dev/null -e /dev/null -R

2 - does anyone know why couch would be crashing under load?

3 - would it be wise to try trunk instead of 0.11?

i'm sure we can alleviate load and speed things up greatly using batch inserts, but still
I don't feel that comfortable seeing couch restart itself when it hits a little write load.

I really like couchdb, its a great solution to the replication problem its being used to solve
here and i'm keen to work out whats going on so we can keep using it.

Cheers
Carl.

Mime
View raw message