zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
Date Thu, 13 Jul 2017 23:32:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086596#comment-16086596
] 

ASF GitHub Bot commented on ZOOKEEPER-2770:
-------------------------------------------

Github user hanm commented on the issue:

    https://github.com/apache/zookeeper/pull/307
  
    I think we should consolidate the latency check in `zks.serverStats().updateLatency`.
It's odd to have two (or in future even more) types of latency checks scattered around which
creates fragmentation w.r.t. the definition of what a request latency means. The existing
latency measurement in ServerStats measures the time between a request creation and a request
landing at final request processor; the patch instead measures end to end time of a request
from its start to finish processing. I am fine with the end to end processing time, though
I'd like to double check with a few folks around to make sure the regression and impact of
this change is limited.
    
    I think ServerStats is a good place to put the DS Ted recommended. 
    
    I think it's a good idea to scope the JIRA so it's easier to get it reviewed and committed.
What this patch is doing is a positive improvement to the operational aspects of ZK so that
can be the scope of this PR. On top of that future improvements could be what Edward and Ted
suggested (JMX, distribution of latencies / histogram etc). These work can be tracked by making
them sub tasks under current JIRA.


> ZooKeeper slow operation log
> ----------------------------
>
>                 Key: ZOOKEEPER-2770
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
>             Project: ZooKeeper
>          Issue Type: Improvement
>            Reporter: Karan Mehta
>            Assignee: Karan Mehta
>         Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why any given
read or write operation may become slow: a software bug, a protocol problem, a hardware issue
with the commit log(s), a network issue. If the problem is constant it is trivial to come
to an understanding of the cause. However in order to diagnose intermittent problems we often
don't know where, or when, to begin looking. We need some sort of timestamped indication of
the problem. Although ZooKeeper is not a datastore, it does persist data, and can suffer intermittent
performance degradation, and should consider implementing a 'slow query' log, a feature very
common to services which persist information on behalf of clients which may be sensitive to
latency while waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally processing the
request, that the current time minus arrival time of the request is beyond a configured threshold.

> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message