couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paolo Negri <paolo.ne...@wooga.net>
Subject Re: timeout hitting a database url after launching compaction
Date Wed, 26 Oct 2011 08:18:44 GMT
I just wanted to add some more information about this behavior, the
problem happens not just after triggering compaction but can happen at
any point while compaction is in progress, tonight we got the same
error 20 minutes after launching compaction.

Thanks,

Paolo

On Mon, Oct 17, 2011 at 2:44 PM, Paolo Negri <paolo.negri@wooga.net> wrote:
> On Mon, Oct 17, 2011 at 2:30 PM, Robert Newson <rnewson@apache.org> wrote:
>> Do you have the full stacktrace from couch.log?
>
> I pasted it here https://gist.github.com/1292529
>
>>
>> On 17 October 2011 13:04, Paolo Negri <paolo.negri@wooga.net> wrote:
>>> On Mon, Oct 17, 2011 at 1:57 PM, Robert Newson <rnewson@apache.org> wrote:
>>>> Compaction is an online process, there should be no expectation of 500
>>>> responses before, during, or after compaction.
>>>>
>>>> In this case, it seems the couch_server process is blocked for more
>>>> than five seconds performing I/O and the gen_server:call from
>>>> couch_server:open times out. This timeout has been increased to
>>>> infinity since 1.0.0.
>>>>
>>>> What version are you running?
>>>
>>> I compiled master from github here are the details
>>>
>>> "CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"},
>>>
>>> The reason to use master is that we wanted to benefit from the
>>> ejson/snappy adoption so I guess I could actually also use the 1.2
>>> branch
>>>
>>> Paolo
>>>
>>>>
>>>> B.
>>>>
>>>> On 17 October 2011 12:05, Martin Hewitt <martin@thenoi.se> wrote:
>>>>> I disagree, it makes sense as the 5xx error code range is for responses
where the server can't fulfil a well-formed, valid client request.
>>>>>
>>>>> Your GET is well-formed, but the server can't process it as it's working
on the previous action, so a 500 is perfectly valid. Perhaps a 503 would be more accurate,
but the 5xx prefix is certainly correct.
>>>>>
>>>>> Martin
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On 17 Oct 2011, at 09:29, Paolo Negri <paolo.negri@wooga.net> wrote:
>>>>>
>>>>>> I agree on the fact that what happens is pretty clear to explain,
I
>>>>>> still thought it would be useful for the developers to know since
>>>>>> offering a 500 status code for a known system condition is probably
>>>>>> something that can be improved.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Paolo
>>>>>>
>>>>>> On Mon, Oct 17, 2011 at 10:24 AM, CGS <cgsmcmlxxv@gmail.com>
wrote:
>>>>>>> I am not developer, but it's quite logic, I may say. Once you
started the
>>>>>>> compaction, your CouchDB is not responsive while the database
is preparing
>>>>>>> for compaction. Triggering immediately GET, the web instance
responds with
>>>>>>> status code 500 (internal server error, meaning unresponsive
server in this
>>>>>>> case). So, nothing unusual in my opinion.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> CGS
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 10/17/2011 09:57 AM, Paolo Negri wrote:
>>>>>>>>
>>>>>>>> IO activity is not monitored, there's only one db on the
couchdb
>>>>>>>> instance and the described job is the only activity executed
on this
>>>>>>>> machine.
>>>>>>>> Delaying the first request on the database url by 30 seconds
did
>>>>>>>> actually prevent the problem from happening again.
>>>>>>>> So the issue seems to happen specifically at the moment right
after
>>>>>>>> compaction is started.
>>>>>>>> The database is about 7GB big once compressed, the server
is hosted on
>>>>>>>> ec2 with the database directory placed on his own dedicated
ephemeral
>>>>>>>> storage.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Paolo
>>>>>>>>
>>>>>>>> On Fri, Oct 14, 2011 at 9:05 PM, Paul Davis<paul.joseph.davis@gmail.com>
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>> Do you monitor IO activity or system responsiveness when
you're doing
>>>>>>>>> this. I've seen some compactions wallop a system when
it switches over
>>>>>>>>> due to removing large old files and such. It doesn't
sound like this
>>>>>>>>> is big enough for that case but it might be something
worth checking.
>>>>>>>>>
>>>>>>>>> On Fri, Oct 14, 2011 at 3:41 AM, Paolo Negri<paolo.negri@wooga.net>
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>> Dear list,
>>>>>>>>>>
>>>>>>>>>> We have a script that does the following (strictly
sequentially)
>>>>>>>>>>
>>>>>>>>>> 1) update 300K docs in a db
>>>>>>>>>> 2) launch compaction of the db
>>>>>>>>>> 3) poll at a 30 sec frequency http://127.0.0.1:5984/database
to know
>>>>>>>>>> when compaction completed
>>>>>>>>>>
>>>>>>>>>> Last night we got a timeout error during 3, we think
that this might
>>>>>>>>>> be because the first polling (GET  http://127.0.0.1:5984/database)
is
>>>>>>>>>> done right after triggering compaction
>>>>>>>>>>
>>>>>>>>>> I thought the dev team might be interested in knowing
that this is
>>>>>>>>>> happening
>>>>>>>>>>
>>>>>>>>>> There's no other activity on the couchdb instance
other than what
>>>>>>>>>> described in this email.
>>>>>>>>>>
>>>>>>>>>> ERROR unexpectd response checking compaction db:
{ok,"500",
>>>>>>>>>>                                  
              [{"Server",
>>>>>>>>>>
>>>>>>>>>> "CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"},
>>>>>>>>>>                                  
               {"Date",
>>>>>>>>>>                                  
                "Fri, 14 Oct 2011
>>>>>>>>>> 01:46:37 GMT"},
>>>>>>>>>>                                  
               {"Content-Type",
>>>>>>>>>>                                  
                "text/plain;
>>>>>>>>>> charset=utf-8"},
>>>>>>>>>>
>>>>>>>>>>  {"Content-Length","350"},
>>>>>>>>>>                                  
               {"Cache-Control",
>>>>>>>>>>                                  
                "must-revalidate"}],
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <<"{\"error\":\"{timeout,{gen_server,call,[<0.21934.9>,{open_ref_count,<0.4090.13>}]}}\",\"reason\":\"{gen_server,call,\\n
>>>>>>>>>>   [couch_server,\\n     {open,<<\\\"backup\\\">>,\\n
>>>>>>>>>> [{user_ctx,\\n              {user_ctx,null,\\n
>>>>>>>>>> [<<\\\"_admin\\\">>],\\n<<\\\"{couch_httpd_auth,
>>>>>>>>>> default_authentication_handler}\\\">>}}]},\\n
    infinity]}\"}\n">>}
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Paolo

Mime
View raw message