couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Marca (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind
Date Mon, 18 Jul 2011 20:01:57 GMT

     [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

James Marca updated COUCHDB-1226:
---------------------------------

    Attachment: topcouch.log

This file is an edited version of the output of top as couchdb crashes while handling replication,
showing the gradual increase in RAM as couchdb handles 7 consecutive database replications
and crashes on the 7th.

> Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
> ----------------------------------------------------------------------------
>
>                 Key: COUCHDB-1226
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1
>         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July
2011.
>            Reporter: James Marca
>         Attachments: topcouch.log
>
>
> When replicating databases (pull replication), CouchDB will silently crash.  I suspect
a memory leak is leading to the crash, because I watch the beam process slowly creep up in
RAM usage, then the server dies.
> For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with
manually scrubbed server address):
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication
log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication
log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication
log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication
log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target"
at <0.10054.0>
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed
seq to 1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f:
http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed
source update #1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed
seq to 2
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f:
http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed
source update #2
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed
seq to 10
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f:
http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed
source update #10
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f:
http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed
source update #14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed
seq to 14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed
seq to 20
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f:
http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed
source update #20
> [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full
commit
> [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for
http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source
update_seq 20
> Then, when I restart CouchDB, and restart the node.js program that is setting up the
replication jobs, the crashed replication job picks up where it left off and completes just
fine.  Again, I scrubbed my server addresses in this log snippet.:
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from
"128.*.*.*"
> Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
>           {'Connection',"close"},
>           {'Content-Type',"application/json"},
>           {'Host',"***[pullserver]***.edu"},
>           {'Transfer-Encoding',"chunked"}]
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for
http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for
vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target"
at <0.3580.0>
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed
seq to 22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f:
http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882:
W Processed source update #22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed
seq to 37
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f:
http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882:
W Processed source update #37
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed
seq to 39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f:
http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882:
W Processed source update #39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed
seq to 47
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f:
http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882:
W Processed source update #47
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed
seq to 57
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f:
http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882:
W Processed source update #57
> [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full
commit
> [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for
http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882
at source update_seq 57
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f:
http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882:
W Processed source update #62
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed
seq to 62
> [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f:
http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882:
W Processed source update #78
> [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full
commit
> [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for
http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882
at source update_seq 78
> [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f:
http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882:
W Processed source update #255
> [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full
commit
> [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for
http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882
at source update_seq 255
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f:
http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882:
W Processed source update #347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full
commit
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for
http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882
at source update_seq 347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f:
http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882:
Finishing
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate
200
> Letting that replication program run, and watching top, CouchDB's total share of RAM
crept up to 70%, then it crashed.
> Again, the log on the crashing server isn't helpful (more or less the same as above)
> The replication program gets through about 8 to 12 databases before it crashes.
> Each database (when replicated to the target server) takes up on average around 700MB,
un-compacted.
> The databases are all similar (annual data for detectors), with one doc per day's data.
 Each document is around 700K.
> If there is any more information (or more helpful information) I can provide, please
let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message