couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anand Chitipothu <anandol...@gmail.com>
Subject CouchDB random crashes
Date Wed, 07 Sep 2011 14:44:17 GMT
Hi,

We are using CouchDB in production at openlibrary.org. We have couple
of databases with 25M docs and views with about 80M rows.

There was a crash and after restart, the couchdb server started
recomputing all views of all the databases. Sometime before the crash
I ran _view_cleanup on one of the databases to delete unused view
files, but I'm not sure, if that caused it.

We were using couchdb version 1.0.1. After that crash, I copied the
databases to a new node, restored views from a backup, upgraded
couchdb to 1.0.3 thinking that it will be more stable and everything
seems alright for a while.

I tried to compact a view and a database as compaction was not run
since the db was created. That increased the load on the machine and
couchdb crashed and restart started view recomputation.

I restored the view from backup again and it looked alright again for
a while. After that I've been going though phases of random crash and
restoring views  from backup. I'm not sure what is triggering this
crash. Tried moving back to 1.0.1, but that didn't help.

After the crash, the last error in the couch.log is alway the following:

[Wed, 07 Sep 2011 13:22:08 GMT] [error] [<0.77.0>] {error_report,<0.31.0>,
    {<0.77.0>,supervisor_report,
     [{supervisor,{local,couch_server_sup}},
      {errorContext,shutdown},
      {reason,reached_max_restart_intensity},
      {offender,
          [{pid,<0.29139.3>},
           {name,couch_secondary_services},
           {mfa,{couch_server_sup,start_secondary_services,[]}},
           {restart_type,permanent},
           {shutdown,infinity},
           {child_type,supervisor}]}]}}

Here is the tail of last 10K+ lines of couch.log after each crash.

http://www.archive.org/~anand/files/2011-09-07-couchdb-crash-log.txt
http://www.archive.org/~anand/files/2011-09-07-couchdb-crash2-log.txt

And the full log of most recent crash:

http://www.archive.org/~anand/files/2011-09-07-couchdb-crash2.log.gz

Can someone please help me to fix this?

Thanks,
Anand

Mime
View raw message