Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
Date: Mon, 18 Jul 2011 22:00:58 +0000 (UTC)
From: "James Marca (JIRA)" <jira@apache.org>
To: dev@couchdb.apache.org
Message-ID: 
 <185008119.1943.1311026458016.JavaMail.tomcat@hel.zones.apache.org>
In-Reply-To: 
 <1909992908.891.1311011457905.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to
 crash.  I *suspect* a memory leak of some kind
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067341#comment-13067341 ] 

James Marca commented on COUCHDB-1226:
--------------------------------------

compacting databases prior to replicating had no effect.  CouchDB's beam process still grew in size with each replication, and then CouchDB shut down.


> Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
> ----------------------------------------------------------------------------
>
>                 Key: COUCHDB-1226
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1
>         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July 2011.
>            Reporter: James Marca
>         Attachments: topcouch.log
>
>
> When replicating databases (pull replication), CouchDB will silently crash.  I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies.
> For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with manually scrubbed server address):
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0>
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20
> [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20
> Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine.  Again, I scrubbed my server addresses in this log snippet.:
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*"
> Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
>           {'Connection',"close"},
>           {'Content-Type',"application/json"},
>           {'Host',"***[pullserver]***.edu"},
>           {'Transfer-Encoding',"chunked"}]
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0>
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57
> [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62
> [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78
> [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78
> [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255
> [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200
> Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed.
> Again, the log on the crashing server isn't helpful (more or less the same as above)
> The replication program gets through about 8 to 12 databases before it crashes.
> Each database (when replicated to the target server) takes up on average around 700MB, un-compacted.
> The databases are all similar (annual data for detectors), with one doc per day's data.  Each document is around 700K.
> If there is any more information (or more helpful information) I can provide, please let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira