incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Kocoloski (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-761) Timeouts in couch_log are masked, crashes callers
Date Sun, 13 Jun 2010 15:24:13 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878394#action_12878394
] 

Adam Kocoloski commented on COUCHDB-761:
----------------------------------------

I needed one more patch for get_level_integer() to get make check running, since some of the
tests call couch code that tries to log when couch_log is not running.  I've inlined it below.
 I've committed on trunk and backported to 0.10.x.  Waiting on 0.11.x because Jan has a monster
fix for that branch in the works.


diff --git a/src/couchdb/couch_log.erl b/src/couchdb/couch_log.erl
index 5c8a5e5..2d62cbb 100644
--- a/src/couchdb/couch_log.erl
+++ b/src/couchdb/couch_log.erl
@@ -81,7 +81,11 @@ get_level() ->
     level_atom(get_level_integer()).
 
 get_level_integer() ->
-    ets:lookup_element(?MODULE, level, 2).
+    try
+        ets:lookup_element(?MODULE, level, 2)
+    catch error:badarg ->
+        ?LEVEL_ERROR
+    end.
 
 set_level_integer(Int) ->
     gen_event:call(error_logger, couch_log, {set_level_integer, Int}).


> Timeouts in couch_log are masked, crashes callers
> -------------------------------------------------
>
>                 Key: COUCHDB-761
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-761
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.10.1, 0.10.2, 0.11
>            Reporter: Randall Leeds
>            Priority: Blocker
>             Fix For: 0.10.3, 0.11.1, 1.0
>
>         Attachments: improved-sync-logging-v2.patch, improved-sync-logging.patch
>
>
> Several users have reported seeing crash reports stemming from a function_clause match
on handle_info in various gen_servers. The offending message looks like {#Ref<>, <integer>}.
> After months of banter and sleuthing, I determined that the likely cause was a late reply
to a gen_server:call that timed out, with the #Ref being the tag on the response. After it
came up again today in IRC, kocolosk quickly discovered that the problem appears to be in
couch_log.erl.
> The logging macros (?LOG_*)  call couch_log/*_on which calls get_level_integer/0. When
this call times out the timeout is eaten and a late reply arrives to the calling process later,
triggering the crash.
> Suggestions on how to fix this welcome. Ideas so far are async logging or infinite timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message