couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nils Breunese (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-1343) Starting view cleanup fails with a timeout
Date Thu, 17 Nov 2011 12:17:52 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152017#comment-13152017
] 

Nils Breunese commented on COUCHDB-1343:
----------------------------------------

I discussed this issue in #couchdb on Freenode. Here's a transcript of the discussion about
this issue:

----
<breun> Hi all. I have a production database here and if I POST to /db/_view_cleanup
I get a 500 Internal Server Error like this: http://friendpaste.com/7YSu5IImVg6HAWzmIrqqG3

[10:32] <breun> That 'reason' doesn't make much sense to me. What's next? 
[11:02] <+jan____> breun: lemme see 
[11:03] <+jan____> breun: this looks like heavy IO? 
[11:12] <breun> jan____: Hm, I only get this error for this one database. There are
other databases on the same server for which I can run view cleanup without error. I don't
think this is the busiest database. 
[11:15] <+jan____> breun: technically what happens is that there's a process inside
Erlang that we call to get information about the design doc you want to compact, that request
for information times out when reading the design doc 
[11:15] <+jan____> if that makes sense 
[11:16] <+jan____> breun: do you have more stacktrace? 
[11:16] <breun> jan____: I can request all design docs (it has two) for that database
just fine via Futon. 
[11:17] <+jan____> weird 
[11:18] <+benoitc> how is the disk usage? 
[11:19] <breun> jan____: I don't have more stacktrace, no. /_log shows about the same
info: http://friendpaste.com/1BmLsiY4y4VgENlmxtMref 
[11:19] <breun> benoitc: I don't know, I'm not sure I can access that information in
this production environment. :S Let me check. 
[11:22] <+jan____> breun: ok, I see where the timeout happens, but I don't quite know
it can happen 
[11:22] <+jan____> breun: is there anything else running on that design doc, compaction,
a long view build, anything? 
[11:24] <breun> jan____: According to the status page there is nothing running. I can
reproduce this timeout every time. It's about 5 seconds or so, I think? 
[11:24] <+jan____> benoitc: the timeout is when sending a msg to the couch_view_group
gen_server, not sure if that is disk bound 
[11:25] <+jan____> breun: yes, 5 seconds is the timeout. 
[11:26] <+benoitc> jan____: yes, id din't read the code yet, but i supposed it happened
when passing results to that 
[11:26] <+benoitc> but well i'm not familiar at all about the view cleanup code 
[11:27] <+jan____> benoitc: I'm looking at the code, the only IO that handle_call for
get_group_infor does is couch_file:size() 
[11:27] <+jan____> not saying it isn't significant, but seems weird. 
[11:28] <+benoitc> yup 
[11:29] <+jan____> breun: I'm trying to find out what the cause for this is and what
you can do with this now. do you have the option to restart the couch? 
[11:31] <breun> jan____: Restarting CouchDB has been the solution to all problems I
ever brought to this channel. I thought CouchDB wasn't built on Windows? :) 
[11:32] <+jan____> breun: lol :) 
[11:32] <breun> jan____: Let me see if I can restart it. But then we might never find
out what's wrong here, right? 
[11:32] <+jan____> breun: you could log into the erlang instance and just kill the view
server pid, but you said you don't have much access 
[11:32] <+jan____> breun: let's record this instance in an issue 
[11:33] <breun> jan____: I definitely can't log into the erlang instance. I don't have
shell access to the server it's running on. 
[11:33] <+jan____> the module in question isn't too big, I figure a review would possible
find a race condition or somesuch 
[11:34] <+jan____> either way though this calls for better instrumentation and more
fine grained controll over components running inside couch 
[11:37] <breun> jan____: I'll request a restart and see if that helps. And I'll create
a ticket for the issue. Thanks for looking. 
[11:39] <+jan____> breun: no probs, this really shouldn't happen 
[11:40] <+jan____> or if it does, we should have better ways to rectify the situation

----
                
> Starting view cleanup fails with a timeout
> ------------------------------------------
>
>                 Key: COUCHDB-1343
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1343
>             Project: CouchDB
>          Issue Type: Bug
>    Affects Versions: 1.0.2
>         Environment: Linux
>            Reporter: Nils Breunese
>            Priority: Minor
>
> Our CouchDB maintenance script (daily compaction, view cleanup, etc.) recently started
reporting the following error every day:
> ----
> Error cleaning up views of database 'mashup' for the CouchDB instance at http://hostname:8080
> ----
> When trying to start view cleanup for this particular database (there are more databases
in this CouchDB instance) I get the following in the log:
> ----
> [Thu, 17 Nov 2011 09:28:23 GMT] [error] [<0.6547.171>] Uncaught error in HTTP request:
{exit,
>                                  {timeout,
>                                   {gen_server,call,
>                                    [<0.19070.94>,request_group_info]}}}
> ----
> And the following HTTP 500 response:
> ----
> HTTP/1.1 500 Internal Server Error
> Content-Length: 83
> Server: CouchDB/1.0.2 (Erlang OTP/R13B)
> Date: Thu, 17 Nov 2011 09:28:23 GMT
> Content-Type: text/plain;charset=utf-8
> Cache-Control: must-revalidate
> {"error":"timeout","reason":"{gen_server,call,[<0.19070.94>,request_group_info]}"}
> ----

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message