couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Kocoloski (JIRA)" <>
Subject [jira] Commented: (COUCHDB-761) Timeouts in couch_log are masked, crashes callers
Date Sun, 13 Jun 2010 15:24:13 GMT


Adam Kocoloski commented on COUCHDB-761:

I needed one more patch for get_level_integer() to get make check running, since some of the
tests call couch code that tries to log when couch_log is not running.  I've inlined it below.
 I've committed on trunk and backported to 0.10.x.  Waiting on 0.11.x because Jan has a monster
fix for that branch in the works.

diff --git a/src/couchdb/couch_log.erl b/src/couchdb/couch_log.erl
index 5c8a5e5..2d62cbb 100644
--- a/src/couchdb/couch_log.erl
+++ b/src/couchdb/couch_log.erl
@@ -81,7 +81,11 @@ get_level() ->
 get_level_integer() ->
-    ets:lookup_element(?MODULE, level, 2).
+    try
+        ets:lookup_element(?MODULE, level, 2)
+    catch error:badarg ->
+        ?LEVEL_ERROR
+    end.
 set_level_integer(Int) ->
     gen_event:call(error_logger, couch_log, {set_level_integer, Int}).

> Timeouts in couch_log are masked, crashes callers
> -------------------------------------------------
>                 Key: COUCHDB-761
>                 URL:
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.10.1, 0.10.2, 0.11
>            Reporter: Randall Leeds
>            Priority: Blocker
>             Fix For: 0.10.3, 0.11.1, 1.0
>         Attachments: improved-sync-logging-v2.patch, improved-sync-logging.patch
> Several users have reported seeing crash reports stemming from a function_clause match
on handle_info in various gen_servers. The offending message looks like {#Ref<>, <integer>}.
> After months of banter and sleuthing, I determined that the likely cause was a late reply
to a gen_server:call that timed out, with the #Ref being the tag on the response. After it
came up again today in IRC, kocolosk quickly discovered that the problem appears to be in
> The logging macros (?LOG_*)  call couch_log/*_on which calls get_level_integer/0. When
this call times out the timeout is eaten and a late reply arrives to the calling process later,
triggering the crash.
> Suggestions on how to fix this welcome. Ideas so far are async logging or infinite timeout.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message