Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 1936 invoked from network); 3 Oct 2009 13:35:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Oct 2009 13:35:17 -0000 Received: (qmail 48717 invoked by uid 500); 3 Oct 2009 13:35:16 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 48630 invoked by uid 500); 3 Oct 2009 13:35:16 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 48620 invoked by uid 99); 3 Oct 2009 13:35:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Oct 2009 13:35:16 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of robert.newson@gmail.com designates 209.85.212.171 as permitted sender) Received: from [209.85.212.171] (HELO mail-vw0-f171.google.com) (209.85.212.171) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Oct 2009 13:35:06 +0000 Received: by vws1 with SMTP id 1so1158255vws.27 for ; Sat, 03 Oct 2009 06:34:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=/j5E9cJuaAU7q63PIw73Sh+QBCSCDQvo/xhsCrA8EXU=; b=JMu30r5kBO74nDBWt4f4AaXfXhJ9FRgdGszztuQHi8TbxVHGPJ/Br8Ph2tfJMgooq/ 8s2ypCb+M7o6jWpZCfzbJODYdeUSOy06MDI04XsQrxveDsPRdTZ8nwpRwZ6uE4ei5+hC SeqRvXNETySGGasf9VtQBNVI8G/MGGNKbLeU8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=S/XapkhbuEAPmyQYE/Sq2khRswQOVmCf8EWGc6UTzA8oh3RqGORRZ/ZMcSOkEKTQY0 H3oH1qW5BT+voKedj4cgzfkNfoVfmFg6ZCf/7Gk3hHu1hWB5ht2DYPYITPfP9t8PHJSg mEXa/c5IyiSA0RchNeQBAD0dEVAxP20nWpOzg= MIME-Version: 1.0 Received: by 10.220.111.212 with SMTP id t20mr6609601vcp.55.1254576885250; Sat, 03 Oct 2009 06:34:45 -0700 (PDT) In-Reply-To: <46aeb24f0910030633q4fba3499u827c4572d8fc85dd@mail.gmail.com> References: <46aeb24f0910030550oe25aa8dm955f83d051b935c6@mail.gmail.com> <46aeb24f0910030552s657a4aefmc2a4241d75253923@mail.gmail.com> <46aeb24f0910030633q4fba3499u827c4572d8fc85dd@mail.gmail.com> Date: Sat, 3 Oct 2009 14:34:45 +0100 Message-ID: <46aeb24f0910030634jb094a9br7cca5af35d39ec6f@mail.gmail.com> Subject: Re: Connection refused when inserting document and reached_max_restart_intensity in the log. From: Robert Newson To: dev@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org I should point out that my test does this; 1) PUT _config/uuid/algorithm with "random" 2) insert some documents 3) PUT _config/uuid/algorithm with "sequential" 4) insert some documents If you loop that, and insert as few as 10 documents at 2) and 4), you will get a connection refused and the stacktrace output, within 60 seconds. On Sat, Oct 3, 2009 at 2:33 PM, Robert Newson wro= te: > Ok, I've got a little further. If I change my test to much short runs > (even 10 documents), I can reproduce the connection refused symptom > and the stacktrace I pasted originally in under a minute, every time. > > What appears to be happening is that the couch_uuids gen_server is > failing (being restarted too frequently), part of the supervision tree > is torn down and rebuilt, and a concurrent write operation fails while > that is happening. Since I'm pretty sure that's not what should happen > with Erlang/OTP, it's hopefully a straightforward bug. > > Alas, my test client is in Java (using httpclient 4.0, fwiw), so I > can't easily post a unit test for this right now. > > B. > > On Sat, Oct 3, 2009 at 1:52 PM, Robert Newson w= rote: >> A subsequent run that encountered the connection refused error did not >> cause the couch_uuids supervisor to restart it, so the two problems >> are unrelated. >> >> On Sat, Oct 3, 2009 at 1:50 PM, Robert Newson = wrote: >>> Hi, >>> >>> Jan suggested I start a thread on dev about a problem I'm encountering >>> on couchdb trunk. I'm performing long running insertion tests (that >>> is, millions of inserts) in order to quantify the differences between >>> batch vs. sync and random identifiers vs. sequential ones. I find it >>> hard to complete a 5 million insertion run as my client eventually >>> (and randomly) gets a "connection refused" error from couchdb. >>> Immediately after that occurs, I can successfully hit couchdb with >>> curl, so it's transitory. I found the following errors in the log >>> around the time of the problem; >>> >>> =3DSUPERVISOR REPORT=3D=3D=3D=3D 3-Oct-2009::13:32:18 =3D=3D=3D >>> =A0 =A0 Supervisor: {local,couch_secondary_services} >>> =A0 =A0 Context: =A0 =A0shutdown >>> =A0 =A0 Reason: =A0 =A0 reached_max_restart_intensity >>> =A0 =A0 Offender: =A0 [{pid,<0.5273.0>}, >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{name,uuids}, >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{mfa,{couch_uuids,start,[]}}, >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{restart_type,permanent}, >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{shutdown,brutal_kill}, >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{child_type,worker}] >>> >>> [error] [<0.76.0>] {error_report,<0.30.0>, >>> =A0 =A0{<0.76.0>,supervisor_report, >>> =A0 =A0 [{supervisor,{local,couch_server_sup}}, >>> =A0 =A0 =A0{errorContext,child_terminated}, >>> =A0 =A0 =A0{reason,shutdown}, >>> =A0 =A0 =A0{offender, >>> =A0 =A0 =A0 =A0 =A0[{pid,<0.2218.0>}, >>> =A0 =A0 =A0 =A0 =A0 {name,couch_secondary_services}, >>> =A0 =A0 =A0 =A0 =A0 {mfa,{couch_server_sup,start_secondary_services,[]}= }, >>> =A0 =A0 =A0 =A0 =A0 {restart_type,permanent}, >>> =A0 =A0 =A0 =A0 =A0 {shutdown,infinity}, >>> =A0 =A0 =A0 =A0 =A0 {child_type,supervisor}]}]}} >>> >>> =3DSUPERVISOR REPORT=3D=3D=3D=3D 3-Oct-2009::13:32:18 =3D=3D=3D >>> =A0 =A0 Supervisor: {local,couch_server_sup} >>> =A0 =A0 Context: =A0 =A0child_terminated >>> =A0 =A0 Reason: =A0 =A0 shutdown >>> =A0 =A0 Offender: =A0 [{pid,<0.2218.0>}, >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{name,couch_secondary_services}, >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{mfa,{couch_server_sup,start_seconda= ry_services,[]}}, >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{restart_type,permanent}, >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{shutdown,infinity}, >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{child_type,supervisor}] >>> >>> >>> =3DERROR REPORT=3D=3D=3D=3D 3-Oct-2009::13:32:18 =3D=3D=3D >>> Error in process <0.5316.0> with exit value: >>> {badarg,[{ets,insert,[stats_hit_table,{{couchdb,open_databases},-1}]},{= couch_stats_collector,decrement,1}]} >>> >>> >>> =3DERROR REPORT=3D=3D=3D=3D 3-Oct-2009::13:32:18 =3D=3D=3D >>> Error in process <0.5312.0> with exit value: >>> {badarg,[{ets,insert,[stats_hit_table,{{couchdb,open_os_files},-1}]},{c= ouch_stats_collector,decrement,1}]} >>> >> >