Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 73748 invoked from network); 18 Dec 2009 23:47:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Dec 2009 23:47:42 -0000 Received: (qmail 69543 invoked by uid 500); 18 Dec 2009 23:47:41 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 69478 invoked by uid 500); 18 Dec 2009 23:47:41 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 69341 invoked by uid 99); 18 Dec 2009 23:47:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Dec 2009 23:47:41 +0000 X-ASF-Spam-Status: No, hits=-1997.3 required=10.0 tests=ALL_TRUSTED,FS_REPLICA,WEIRD_PORT X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Dec 2009 23:47:39 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 42650234C1E9 for ; Fri, 18 Dec 2009 15:47:18 -0800 (PST) Message-ID: <205460308.1261180038270.JavaMail.jira@brutus> Date: Fri, 18 Dec 2009 23:47:18 +0000 (UTC) From: "Robert Newson (JIRA)" To: dev@couchdb.apache.org Subject: [jira] Commented: (COUCHDB-597) Replication tasks crash. In-Reply-To: <1826646313.1260661158168.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/COUCHDB-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792700#action_12792700 ] Robert Newson commented on COUCHDB-597: --------------------------------------- I can reproduce at least one problem with replication with the script below. If executed serially, the tasks complete, but if issued in parallel, they do not; #!/bin/sh URL=http://localhost:5984 dd if=/dev/zero of=att bs=60k count=1 curl -X DELETE $URL/db1 curl -X DELETE $URL/db2 curl -X DELETE $URL/db3 curl -X DELETE $URL/db4 curl -X DELETE $URL/db5 curl -X PUT $URL/db1 for COUNT in {0..100000} do curl -s -X PUT -d '{"text":"some text goes here, oh yes."}' $URL/db1/doc$((COUNT * 2)) > /dev/null curl -s -H "Expect: " -X PUT --data-binary @att $URL/db1/doc$((COUNT * 2 + 1))/att > /dev/null done # At least one of these tasks will fail. curl http://localhost:5984/_replicate -d "{\"source\":\"$URL/db1\",\"target\":\"db2\",\"create_target\":true}" & curl http://localhost:5984/_replicate -d "{\"source\":\"$URL/db1\",\"target\":\"db3\",\"create_target\":true}" & curl http://localhost:5984/_replicate -d "{\"source\":\"$URL/db1\",\"target\":\"db4\",\"create_target\":true}" & curl http://localhost:5984/_replicate -d "{\"source\":\"$URL/db1\",\"target\":\"db5\",\"create_target\":true}" & wait > Replication tasks crash. > ------------------------ > > Key: COUCHDB-597 > URL: https://issues.apache.org/jira/browse/COUCHDB-597 > Project: CouchDB > Issue Type: Bug > Components: Database Core > Affects Versions: 0.11 > Reporter: Robert Newson > > If I kick off 10 replication tasks in quick succession, occasionally one or two of the replication tasks will die and not be resumed. It seems that the stat tracking is a little buggy, and under stress can eventually cause a permanent failure of the supervised replication task; > [Fri, 11 Dec 2009 19:00:08 GMT] [error] [<0.80.0>] {error_report,<0.30.0>, > {<0.80.0>,supervisor_report, > [{supervisor,{local,couch_rep_sup}}, > {errorContext,shutdown_error}, > {reason,killed}, > {offender, > [{pid,<0.6700.11>}, > {name,"fcbb13200a1618cf983b347f4d2c9835+create_target"}, > {mfa, > {gen_server,start_link, > [couch_rep, > ["fcbb13200a1618cf983b347f4d2c9835", > {[{<<"create_target">>,true}, > {<<"source">>,<<"http://node:5984/perf-p2">>}, > {<<"target">>,<<"perf-p2">>}]}, > {user_ctx,null,[<<"_admin">>]}], > []]}}, > {restart_type,temporary}, > {shutdown,1}, > {child_type,worker}]}]}} > [Fri, 11 Dec 2009 19:00:08 GMT] [error] [emulator] Error in process <0.6705.11> with exit value: {badarg,[{ets,insert,[stats_hit_table,{{couchdb,open_os_files},-1}]},{couch_stats_collector,decrement,1}]} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.