couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Randall Leeds (JIRA)" <>
Subject [jira] Updated: (COUCHDB-761) Timeouts in couch_log are masked, crashes callers
Date Sat, 22 May 2010 00:38:17 GMT


Randall Leeds updated COUCHDB-761:

    Attachment: improved-sync-logging-v2.patch

Here's the second version of the patch.

1) I took Adam's advice and simplified the message format for messages sent by the ?LOG_*
2) I re-ordered the handle_event clauses. The first few handle the new log macro message format,
the next few handle messages coming from sasl
3) I took this opportunity to add a config direction log:include_sasl which defaults to true.
When include_sasl is false the couch_log process will ignore sasl messages. I thought about
making this dependent on log level somehow but decided I prefer to give people options and
have sensible defaults.

What this accomplishes:
1) synchronous logging - log messages are never missed since we use sync_notify now instead
of the notify call error_logger uses
2) log macros for log levels that are disabled should have less performance impact. I don't
know and haven't tested how an ets lookup compares to a gen_server call but at least responses
can be fast even when the log server is busy formatting a long message
3) eliminating the gen:call for couch_log:log_level_integer/0 means no more unexpected {#Ref<...>,
LogLevel} messages arriving after a timeout due to a slow or busy log server process
4) sasl logging can be disabled in production environments where the entire state of crashing
gen_servers is probably waaay TMI when we probably ?LOG_ERROR a more useful message anyway

Open for re-review, comments, questions.

> Timeouts in couch_log are masked, crashes callers
> -------------------------------------------------
>                 Key: COUCHDB-761
>                 URL:
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.10.1, 0.10.2, 0.11
>            Reporter: Randall Leeds
>            Priority: Blocker
>             Fix For: 0.10.3, 0.11.1, 1.0
>         Attachments: improved-sync-logging-v2.patch, improved-sync-logging.patch
> Several users have reported seeing crash reports stemming from a function_clause match
on handle_info in various gen_servers. The offending message looks like {#Ref<>, <integer>}.
> After months of banter and sleuthing, I determined that the likely cause was a late reply
to a gen_server:call that timed out, with the #Ref being the tag on the response. After it
came up again today in IRC, kocolosk quickly discovered that the problem appears to be in
> The logging macros (?LOG_*)  call couch_log/*_on which calls get_level_integer/0. When
this call times out the timeout is eaten and a late reply arrives to the calling process later,
triggering the crash.
> Suggestions on how to fix this welcome. Ideas so far are async logging or infinite timeout.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message