couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Darren Gibbard (JIRA)" <j...@apache.org>
Subject [jira] [Created] (COUCHDB-2070) [1.4.0] CouchDB Replication Crashes
Date Tue, 18 Feb 2014 15:50:21 GMT
Darren Gibbard created COUCHDB-2070:
---------------------------------------

             Summary: [1.4.0] CouchDB Replication Crashes
                 Key: COUCHDB-2070
                 URL: https://issues.apache.org/jira/browse/COUCHDB-2070
             Project: CouchDB
          Issue Type: Bug
      Security Level: public (Regular issues)
          Components: Replication
            Reporter: Darren Gibbard


Hi all,
I have an issue at the moment that appears to have followed me from v1.2.1 with erlang R14,
through to an upgrade to v1.4.0 with R16B01.

I have 20 "remote" nodes, and one "central" node; and each of the remote instances are configured
with Bi-Direction replication (ie. no replication defined on the Central node directly). Single
main database of ~600,000 documents at ~11GB in size.

On the remote nodes, and more frequently the Central node, I get *huge* (3000+ lines) errors
in the logs- seemingly intermittently; I'm yet to track down the root cause here. Open file
handles and ERL_MAX_PORTS are set to values upwards of 16k.

Other stats:
{noformat}
$ sudo su - couchdb -c "lsof | grep -c ."
1511

$ sudo netstat -npla | grep "ESTAB" | grep -c .
310

$ ps -ef | grep -c "^couchdb" 
19
{noformat}

An example log from a Remote node is: http://dgunix.com/cdblog/couchdb_v1.4.0_erl16B01.20140218.log
An example log from the Central node is: http://dgunix.com/cdblog/couchdb_v1.4.0_erl16B01_central.20140218.log

The main error line is "{error,{error,req_timedout}}}}" for either "_bulk_docs" on remote
nodes, or "_revs_diff" on the central node it would seem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message