zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
Date Thu, 27 Jul 2017 05:53:01 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102749#comment-16102749
] 

Ted Dunning commented on ZOOKEEPER-2770:
----------------------------------------

The typical approach is to set a limit on number of messages per unit time
(say one every 10 minutes). Each message that is printed sets a coalescence
time during which no further messages are printed, but a counter is
updated. At the end of the coalescence time a modified message which
mentions that n additional events were detected and the coalescence time is
disabled.

This way if the warnings are rare, you get normal behavior. If the warnings
are frequent, you get at most one message per 10 minutes (or whatever
coalescence period you choose). You get instant notification of a problem
and limited log output.


On Wed, Jul 26, 2017 at 10:05 PM, Karan Mehta (JIRA) <jira@apache.org>



> ZooKeeper slow operation log
> ----------------------------
>
>                 Key: ZOOKEEPER-2770
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
>             Project: ZooKeeper
>          Issue Type: Improvement
>            Reporter: Karan Mehta
>            Assignee: Karan Mehta
>         Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why any given
read or write operation may become slow: a software bug, a protocol problem, a hardware issue
with the commit log(s), a network issue. If the problem is constant it is trivial to come
to an understanding of the cause. However in order to diagnose intermittent problems we often
don't know where, or when, to begin looking. We need some sort of timestamped indication of
the problem. Although ZooKeeper is not a datastore, it does persist data, and can suffer intermittent
performance degradation, and should consider implementing a 'slow query' log, a feature very
common to services which persist information on behalf of clients which may be sensitive to
latency while waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally processing the
request, that the current time minus arrival time of the request is beyond a configured threshold.

> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message