couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject Re: "view_conflicts" test fail
Date Sun, 05 Apr 2009 13:12:04 GMT

On Apr 4, 2009, at 6:53 PM, Patric Fors wrote:

>
> 5 apr 2009 kl. 00.23 skrev Adam Kocoloski:
>
>> On Apr 4, 2009, at 5:44 PM, Patric Fors wrote:
>>
>>> Hi,
>>>
>>> Should I be worried that the "view_conflicts" test fail in the  
>>> Test Suite?
>>> I mean, is it the test that fails, or is it couchdb that fails the  
>>> test. :-)
>>
>> Hi Patrick, did you happen to run that test with Safari?   
>> view_conflicts fails for me in Safari 4, but passes in Firefox 3  
>> and in the command-line runner.  In other words, I think it's the  
>> test that fails, not Couch :-)
>
> Aha, thanks!
> And, yes, Safari was the browser I used, I confess :-)
> Ran it again with Firefox and it's all good: 44 of 44 test(s) run, 0  
> failures (55178 ms)
>
> Hm...Command-line runner? Must have missed that one.
> Well, while we are on the command line, I guess these errors are  
> also part of the Test Suits tests?
>
> [info] [<0.10759.0>] 127.0.0.1 - - 'POST' /test_suite_db/ 
> _ensure_full_commit 201

<snipped file descriptor traceback>

> [info] [<0.10759.0>] 127.0.0.1 - - 'POST' /_restart 200
>
>
> /Patric

Hi Patric, funny you should bring that up.  I've been trying to  
understand the source of those tracebacks myself.  Short answer is  
that you probably don't have anything to worry about.  Long answer  
follows ...

CouchDB uses a single file on disk for each database it creates, and  
all access to that file goes through a reference-counted gen_server  
using couch_file as the callback module.  The tracebacks in the logs  
occur when a couch_file gen_server terminates abnormally, where  
"abnormally" just means that the reason given in the exit signal is  
something other than "normal".  It happens rarely, and only when a  
database is deleted or the server is restarted, both of which occur  
much more frequently in the test suite than they do in normal  
operation.   It's not necessarily indicative of a problem.

I believe the issue is one of message ordering.  In normal operation  
couch_ref_counter is supposed to stop couch_file when the DB is  
deleted or the server restarted.  In your log, couch_ref_counter is  
the neighbour at <0.10708.0>.  Take a look at the ref_counter's  
message queue:

{messages, [{'DOWN',#Ref<0.0.0.128931>,process,<0.10644.0>,killed}]},

When the ref_counter processes that message I believe it will trigger  
a normal shutdown of the couch_file.  Unfortunately, couch_file got  
the message about the couch_server at <0.10644.0> going down first, so  
you see what looks like a crash.  The reason this is not a problem is  
that couch_file doesn't do anything differently for a normal or  
abnormal termination.  The only difference is that the Erlang logger  
pukes out this stacktrace if its an abnormal termination.

I think we should look into refactoring the couch_file/ 
couch_ref_counter stuff a bit; the current workflow (server spawns  
file and unlinks, server spawns ref_counter, ref_counter links to  
file) is pretty tough to follow and opens us up to these occasional  
tracebacks in the logs.  Anyway, thanks for listening.  Cheers,

Adam

P.S. Any Erlangers out there might have noticed something odd here.   
couch_server spawn_links a couch_file and then unlinks it, so why does  
couch_file terminate when the server does?  Chandru Mullaparthi (of  
ibrowse fame) pointed out an undocumented OTP feature that seems to be  
responsible in this thread:

http://groups.google.com/group/erlang-programming/browse_thread/thread/8ab392fedcad19b6


Mime
View raw message