Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5E5697FC7 for ; Mon, 18 Jul 2011 20:26:27 +0000 (UTC) Received: (qmail 37971 invoked by uid 500); 18 Jul 2011 20:26:26 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 37921 invoked by uid 500); 18 Jul 2011 20:26:26 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 37911 invoked by uid 99); 18 Jul 2011 20:26:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jul 2011 20:26:25 +0000 X-ASF-Spam-Status: No, hits=-1997.5 required=5.0 tests=ALL_TRUSTED,FS_REPLICA,RP_MATCHES_RCVD,WEIRD_PORT X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jul 2011 20:26:20 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 206B0420FA for ; Mon, 18 Jul 2011 20:25:58 +0000 (UTC) Date: Mon, 18 Jul 2011 20:25:58 +0000 (UTC) From: "James Marca (JIRA)" To: dev@couchdb.apache.org Message-ID: <1867922661.1503.1311020758128.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1909992908.891.1311011457905.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067271#comment-13067271 ] James Marca commented on COUCHDB-1226: -------------------------------------- from erl, I see: Erlang R14B03 (erts-5.8.4) [source] [64-bit] [smp:8:8] [rq:8] [async-threads:0] [kernel-poll:false] As to compaction prior to replication I have not tried that. However, the "source" database is created from scratch, with no edits or deletions, so it shouldn't need much compaction, and the "target" is created just prior to replication. However, I will compact all databases on the "source" machine prior to the next crash/reload/rerun cycle to see if it helps. > Replication causes CouchDB to crash. I *suspect* a memory leak of some kind > ---------------------------------------------------------------------------- > > Key: COUCHDB-1226 > URL: https://issues.apache.org/jira/browse/COUCHDB-1226 > Project: CouchDB > Issue Type: Bug > Components: Replication > Affects Versions: 1.1 > Environment: Gentoo Linux, CouchDB built using standard ebuild. Rebuilt July 2011. > Reporter: James Marca > Attachments: topcouch.log > > > When replicating databases (pull replication), CouchDB will silently crash. I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies. > For the crashing server, the log on "debug" doesn't seem very helpful. It says (with manually scrubbed server address): > [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ > [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ > [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882 > [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882 > [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0> > [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1 > [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1 > [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2 > [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2 > [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10 > [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10 > [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14 > [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14 > [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20 > [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20 > [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit > [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20 > Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine. Again, I scrubbed my server addresses in this log snippet.: > [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*" > Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="}, > {'Connection',"close"}, > {'Content-Type',"application/json"}, > {'Host',"***[pullserver]***.edu"}, > {'Transfer-Encoding',"chunked"}] > [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: [] > [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ > [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882 > [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0> > [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22 > [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22 > [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37 > [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37 > [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39 > [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39 > [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47 > [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47 > [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57 > [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57 > [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit > [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57 > [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62 > [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62 > [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78 > [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit > [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78 > [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255 > [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit > [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255 > [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347 > [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit > [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347 > [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing > [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200 > Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed. > Again, the log on the crashing server isn't helpful (more or less the same as above) > The replication program gets through about 8 to 12 databases before it crashes. > Each database (when replicated to the target server) takes up on average around 700MB, un-compacted. > The databases are all similar (annual data for detectors), with one doc per day's data. Each document is around 700K. > If there is any more information (or more helpful information) I can provide, please let me know. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira