Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 72F2D7A82 for ; Thu, 17 Nov 2011 12:18:17 +0000 (UTC) Received: (qmail 90742 invoked by uid 500); 17 Nov 2011 12:18:16 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 90701 invoked by uid 500); 17 Nov 2011 12:18:16 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 90693 invoked by uid 99); 17 Nov 2011 12:18:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Nov 2011 12:18:16 +0000 X-ASF-Spam-Status: No, hits=-2001.2 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Nov 2011 12:18:13 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 1A9838AAB2 for ; Thu, 17 Nov 2011 12:17:52 +0000 (UTC) Date: Thu, 17 Nov 2011 12:17:52 +0000 (UTC) From: "Nils Breunese (Commented) (JIRA)" To: dev@couchdb.apache.org Message-ID: <904691628.38702.1321532272125.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1078581516.38690.1321532151862.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (COUCHDB-1343) Starting view cleanup fails with a timeout MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/COUCHDB-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152017#comment-13152017 ] Nils Breunese commented on COUCHDB-1343: ---------------------------------------- I discussed this issue in #couchdb on Freenode. Here's a transcript of the discussion about this issue: ---- Hi all. I have a production database here and if I POST to /db/_view_cleanup I get a 500 Internal Server Error like this: http://friendpaste.com/7YSu5IImVg6HAWzmIrqqG3 [10:32] That 'reason' doesn't make much sense to me. What's next? [11:02] <+jan____> breun: lemme see [11:03] <+jan____> breun: this looks like heavy IO? [11:12] jan____: Hm, I only get this error for this one database. There are other databases on the same server for which I can run view cleanup without error. I don't think this is the busiest database. [11:15] <+jan____> breun: technically what happens is that there's a process inside Erlang that we call to get information about the design doc you want to compact, that request for information times out when reading the design doc [11:15] <+jan____> if that makes sense [11:16] <+jan____> breun: do you have more stacktrace? [11:16] jan____: I can request all design docs (it has two) for that database just fine via Futon. [11:17] <+jan____> weird [11:18] <+benoitc> how is the disk usage? [11:19] jan____: I don't have more stacktrace, no. /_log shows about the same info: http://friendpaste.com/1BmLsiY4y4VgENlmxtMref [11:19] benoitc: I don't know, I'm not sure I can access that information in this production environment. :S Let me check. [11:22] <+jan____> breun: ok, I see where the timeout happens, but I don't quite know it can happen [11:22] <+jan____> breun: is there anything else running on that design doc, compaction, a long view build, anything? [11:24] jan____: According to the status page there is nothing running. I can reproduce this timeout every time. It's about 5 seconds or so, I think? [11:24] <+jan____> benoitc: the timeout is when sending a msg to the couch_view_group gen_server, not sure if that is disk bound [11:25] <+jan____> breun: yes, 5 seconds is the timeout. [11:26] <+benoitc> jan____: yes, id din't read the code yet, but i supposed it happened when passing results to that [11:26] <+benoitc> but well i'm not familiar at all about the view cleanup code [11:27] <+jan____> benoitc: I'm looking at the code, the only IO that handle_call for get_group_infor does is couch_file:size() [11:27] <+jan____> not saying it isn't significant, but seems weird. [11:28] <+benoitc> yup [11:29] <+jan____> breun: I'm trying to find out what the cause for this is and what you can do with this now. do you have the option to restart the couch? [11:31] jan____: Restarting CouchDB has been the solution to all problems I ever brought to this channel. I thought CouchDB wasn't built on Windows? :) [11:32] <+jan____> breun: lol :) [11:32] jan____: Let me see if I can restart it. But then we might never find out what's wrong here, right? [11:32] <+jan____> breun: you could log into the erlang instance and just kill the view server pid, but you said you don't have much access [11:32] <+jan____> breun: let's record this instance in an issue [11:33] jan____: I definitely can't log into the erlang instance. I don't have shell access to the server it's running on. [11:33] <+jan____> the module in question isn't too big, I figure a review would possible find a race condition or somesuch [11:34] <+jan____> either way though this calls for better instrumentation and more fine grained controll over components running inside couch [11:37] jan____: I'll request a restart and see if that helps. And I'll create a ticket for the issue. Thanks for looking. [11:39] <+jan____> breun: no probs, this really shouldn't happen [11:40] <+jan____> or if it does, we should have better ways to rectify the situation ---- > Starting view cleanup fails with a timeout > ------------------------------------------ > > Key: COUCHDB-1343 > URL: https://issues.apache.org/jira/browse/COUCHDB-1343 > Project: CouchDB > Issue Type: Bug > Affects Versions: 1.0.2 > Environment: Linux > Reporter: Nils Breunese > Priority: Minor > > Our CouchDB maintenance script (daily compaction, view cleanup, etc.) recently started reporting the following error every day: > ---- > Error cleaning up views of database 'mashup' for the CouchDB instance at http://hostname:8080 > ---- > When trying to start view cleanup for this particular database (there are more databases in this CouchDB instance) I get the following in the log: > ---- > [Thu, 17 Nov 2011 09:28:23 GMT] [error] [<0.6547.171>] Uncaught error in HTTP request: {exit, > {timeout, > {gen_server,call, > [<0.19070.94>,request_group_info]}}} > ---- > And the following HTTP 500 response: > ---- > HTTP/1.1 500 Internal Server Error > Content-Length: 83 > Server: CouchDB/1.0.2 (Erlang OTP/R13B) > Date: Thu, 17 Nov 2011 09:28:23 GMT > Content-Type: text/plain;charset=utf-8 > Cache-Control: must-revalidate > {"error":"timeout","reason":"{gen_server,call,[<0.19070.94>,request_group_info]}"} > ---- -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira