Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8CAFF77F2 for ; Mon, 18 Jul 2011 22:01:24 +0000 (UTC) Received: (qmail 5645 invoked by uid 500); 18 Jul 2011 22:01:23 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 5570 invoked by uid 500); 18 Jul 2011 22:01:23 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 5515 invoked by uid 99); 18 Jul 2011 22:01:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jul 2011 22:01:23 +0000 X-ASF-Spam-Status: No, hits=-1997.5 required=5.0 tests=ALL_TRUSTED,FS_REPLICA,RP_MATCHES_RCVD,WEIRD_PORT X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jul 2011 22:01:18 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 0530942FD3 for ; Mon, 18 Jul 2011 22:00:58 +0000 (UTC) Date: Mon, 18 Jul 2011 22:00:58 +0000 (UTC) From: "James Marca (JIRA)" To: dev@couchdb.apache.org Message-ID: <185008119.1943.1311026458016.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1909992908.891.1311011457905.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067341#comment-13067341 ] James Marca commented on COUCHDB-1226: -------------------------------------- compacting databases prior to replicating had no effect. CouchDB's beam process still grew in size with each replication, and then CouchDB shut down. > Replication causes CouchDB to crash. I *suspect* a memory leak of some kind > ---------------------------------------------------------------------------- > > Key: COUCHDB-1226 > URL: https://issues.apache.org/jira/browse/COUCHDB-1226 > Project: CouchDB > Issue Type: Bug > Components: Replication > Affects Versions: 1.1 > Environment: Gentoo Linux, CouchDB built using standard ebuild. Rebuilt July 2011. > Reporter: James Marca > Attachments: topcouch.log > > > When replicating databases (pull replication), CouchDB will silently crash. I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies. > For the crashing server, the log on "debug" doesn't seem very helpful. It says (with manually scrubbed server address): > [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ > [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ > [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882 > [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882 > [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0> > [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1 > [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1 > [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2 > [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2 > [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10 > [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10 > [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14 > [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14 > [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20 > [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20 > [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit > [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20 > Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine. Again, I scrubbed my server addresses in this log snippet.: > [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*" > Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="}, > {'Connection',"close"}, > {'Content-Type',"application/json"}, > {'Host',"***[pullserver]***.edu"}, > {'Transfer-Encoding',"chunked"}] > [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: [] > [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ > [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882 > [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0> > [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22 > [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22 > [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37 > [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37 > [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39 > [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39 > [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47 > [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47 > [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57 > [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57 > [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit > [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57 > [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62 > [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62 > [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78 > [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit > [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78 > [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255 > [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit > [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255 > [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347 > [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit > [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347 > [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing > [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200 > Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed. > Again, the log on the crashing server isn't helpful (more or less the same as above) > The replication program gets through about 8 to 12 databases before it crashes. > Each database (when replicated to the target server) takes up on average around 700MB, un-compacted. > The databases are all similar (annual data for detectors), with one doc per day's data. Each document is around 700K. > If there is any more information (or more helpful information) I can provide, please let me know. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira