From user-return-18346-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Mon Oct 17 12:45:05 2011 Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E93EC9FAB for ; Mon, 17 Oct 2011 12:45:05 +0000 (UTC) Received: (qmail 35462 invoked by uid 500); 17 Oct 2011 12:45:04 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 35427 invoked by uid 500); 17 Oct 2011 12:45:04 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 35419 invoked by uid 99); 17 Oct 2011 12:45:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Oct 2011 12:45:04 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of paolo.negri@wooga.net designates 209.85.216.173 as permitted sender) Received: from [209.85.216.173] (HELO mail-qy0-f173.google.com) (209.85.216.173) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Oct 2011 12:44:58 +0000 Received: by qyk10 with SMTP id 10so1235274qyk.11 for ; Mon, 17 Oct 2011 05:44:37 -0700 (PDT) Received: by 10.68.30.65 with SMTP id q1mr30465035pbh.91.1318855477130; Mon, 17 Oct 2011 05:44:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.142.84.14 with HTTP; Mon, 17 Oct 2011 05:44:17 -0700 (PDT) In-Reply-To: References: <4E9BE630.10608@gmail.com> <15889B59-1148-42D0-95A9-B80A917D4912@thenoi.se> From: Paolo Negri Date: Mon, 17 Oct 2011 14:44:17 +0200 Message-ID: Subject: Re: timeout hitting a database url after launching compaction To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Mon, Oct 17, 2011 at 2:30 PM, Robert Newson wrote: > Do you have the full stacktrace from couch.log? I pasted it here https://gist.github.com/1292529 > > On 17 October 2011 13:04, Paolo Negri wrote: >> On Mon, Oct 17, 2011 at 1:57 PM, Robert Newson wrot= e: >>> Compaction is an online process, there should be no expectation of 500 >>> responses before, during, or after compaction. >>> >>> In this case, it seems the couch_server process is blocked for more >>> than five seconds performing I/O and the gen_server:call from >>> couch_server:open times out. This timeout has been increased to >>> infinity since 1.0.0. >>> >>> What version are you running? >> >> I compiled master from github here are the details >> >> "CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"}, >> >> The reason to use master is that we wanted to benefit from the >> ejson/snappy adoption so I guess I could actually also use the 1.2 >> branch >> >> Paolo >> >>> >>> B. >>> >>> On 17 October 2011 12:05, Martin Hewitt wrote: >>>> I disagree, it makes sense as the 5xx error code range is for response= s where the server can't fulfil a well-formed, valid client request. >>>> >>>> Your GET is well-formed, but the server can't process it as it's worki= ng on the previous action, so a 500 is perfectly valid. Perhaps a 503 would= be more accurate, but the 5xx prefix is certainly correct. >>>> >>>> Martin >>>> >>>> Sent from my iPhone >>>> >>>> On 17 Oct 2011, at 09:29, Paolo Negri wrote: >>>> >>>>> I agree on the fact that what happens is pretty clear to explain, I >>>>> still thought it would be useful for the developers to know since >>>>> offering a 500 status code for a known system condition is probably >>>>> something that can be improved. >>>>> >>>>> Thanks, >>>>> >>>>> Paolo >>>>> >>>>> On Mon, Oct 17, 2011 at 10:24 AM, CGS wrote: >>>>>> I am not developer, but it's quite logic, I may say. Once you starte= d the >>>>>> compaction, your CouchDB is not responsive while the database is pre= paring >>>>>> for compaction. Triggering immediately GET, the web instance respond= s with >>>>>> status code 500 (internal server error, meaning unresponsive server = in this >>>>>> case). So, nothing unusual in my opinion. >>>>>> >>>>>> Cheers, >>>>>> CGS >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 10/17/2011 09:57 AM, Paolo Negri wrote: >>>>>>> >>>>>>> IO activity is not monitored, there's only one db on the couchdb >>>>>>> instance and the described job is the only activity executed on thi= s >>>>>>> machine. >>>>>>> Delaying the first request on the database url by 30 seconds did >>>>>>> actually prevent the problem from happening again. >>>>>>> So the issue seems to happen specifically at the moment right after >>>>>>> compaction is started. >>>>>>> The database is about 7GB big once compressed, the server is hosted= on >>>>>>> ec2 with the database directory placed on his own dedicated ephemer= al >>>>>>> storage. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Paolo >>>>>>> >>>>>>> On Fri, Oct 14, 2011 at 9:05 PM, Paul Davis >>>>>>> =A0wrote: >>>>>>>> >>>>>>>> Do you monitor IO activity or system responsiveness when you're do= ing >>>>>>>> this. I've seen some compactions wallop a system when it switches = over >>>>>>>> due to removing large old files and such. It doesn't sound like th= is >>>>>>>> is big enough for that case but it might be something worth checki= ng. >>>>>>>> >>>>>>>> On Fri, Oct 14, 2011 at 3:41 AM, Paolo Negri >>>>>>>> =A0wrote: >>>>>>>>> >>>>>>>>> Dear list, >>>>>>>>> >>>>>>>>> We have a script that does the following (strictly sequentially) >>>>>>>>> >>>>>>>>> 1) update 300K docs in a db >>>>>>>>> 2) launch compaction of the db >>>>>>>>> 3) poll at a 30 sec frequency http://127.0.0.1:5984/database to k= now >>>>>>>>> when compaction completed >>>>>>>>> >>>>>>>>> Last night we got a timeout error during 3, we think that this mi= ght >>>>>>>>> be because the first polling (GET =A0http://127.0.0.1:5984/databa= se) is >>>>>>>>> done right after triggering compaction >>>>>>>>> >>>>>>>>> I thought the dev team might be interested in knowing that this i= s >>>>>>>>> happening >>>>>>>>> >>>>>>>>> There's no other activity on the couchdb instance other than what >>>>>>>>> described in this email. >>>>>>>>> >>>>>>>>> ERROR unexpectd response checking compaction db: {ok,"500", >>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 [{"Server", >>>>>>>>> >>>>>>>>> "CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"}, >>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{"Date", >>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "Fri, 14 Oct 2011 >>>>>>>>> 01:46:37 GMT"}, >>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{"Content-Type", >>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "text/plain; >>>>>>>>> charset=3Dutf-8"}, >>>>>>>>> >>>>>>>>> =A0{"Content-Length","350"}, >>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{"Cache-Control", >>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "must-revalidate"}], >>>>>>>>> >>>>>>>>> >>>>>>>>> <<"{\"error\":\"{timeout,{gen_server,call,[<0.21934.9>,{open_ref_= count,<0.4090.13>}]}}\",\"reason\":\"{gen_server,call,\\n >>>>>>>>> =A0 [couch_server,\\n =A0 =A0 {open,<<\\\"backup\\\">>,\\n >>>>>>>>> [{user_ctx,\\n =A0 =A0 =A0 =A0 =A0 =A0 =A0{user_ctx,null,\\n >>>>>>>>> [<<\\\"_admin\\\">>],\\n<<\\\"{couch_httpd_auth, >>>>>>>>> default_authentication_handler}\\\">>}}]},\\n =A0 =A0 infinity]}\= "}\n">>} >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Paolo >>>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Engineering >>>>> http://www.wooga.com | phone +49-30-8962 5058 =A0| fax +49-30-8964 90= 64 >>>>> >>>>> wooga GmbH | Saarbruecker Str. 38 | 10405 Berlin | Germany >>>>> Sitz der Gesellschaft: Berlin; HRB 117846 B >>>>> Registergericht Berlin-Charlottenburg >>>>> Geschaeftsfuehrung: Jens Begemann, Philipp Moeser >>>> >>> >> >> >> >> -- >> Engineering >> http://www.wooga.com | phone +49-30-8962 5058=A0 | fax +49-30-8964 9064 >> >> wooga GmbH | Saarbruecker Str. 38 | 10405 Berlin | Germany >> Sitz der Gesellschaft: Berlin; HRB 117846 B >> Registergericht Berlin-Charlottenburg >> Geschaeftsfuehrung: Jens Begemann, Philipp Moeser >> > --=20 Engineering http://www.wooga.com | phone +49-30-8962 5058=A0 | fax +49-30-8964 9064 wooga GmbH | Saarbruecker Str. 38 | 10405 Berlin | Germany Sitz der Gesellschaft: Berlin; HRB 117846 B Registergericht Berlin-Charlottenburg Geschaeftsfuehrung: Jens Begemann, Philipp Moeser