zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
Date Tue, 25 Jul 2017 19:24:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100603#comment-16100603

Ted Dunning commented on ZOOKEEPER-2770:


I am not so sure that *I* agree with me at this point.

It is fair to say that on occasion there are slow operations in ZK and it would be good to
know about them. 

This kind of problem is almost always due, in my own vicarious experience,  to bad configuration.
Often the bad configuration is simply collocation with a noisy neighbor on a deficient storage
layer.  There might be situations where an operation is slow due to the content of the query
itself, but I cannot imagine what those situations might be.  Writing a large value (but that
is strictly limited in size), or even doing a huge multi-op (which has the same limited size
in aggregate) should never take very long.

As such, I would expect that the highest diagnostic value would not be something that dumped
the contents of slow queries, but rather a capability that characterizes the entire distribution
of query times. The frequency of slow queries is a diagnostic of sorts, but is one that could
be inferred from the time-varying distributional information I was suggesting.

That said, I don't think that a slow query log is a BAD thing (except a bit bad in terms of
security if it logs the actual query). And I wouldn't want the BEST thing (a distribution
log) to stop somebody contributing something.

> ZooKeeper slow operation log
> ----------------------------
>                 Key: ZOOKEEPER-2770
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
>             Project: ZooKeeper
>          Issue Type: Improvement
>            Reporter: Karan Mehta
>            Assignee: Karan Mehta
>         Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, ZOOKEEPER-2770.003.patch
> ZooKeeper is a complex distributed application. There are many reasons why any given
read or write operation may become slow: a software bug, a protocol problem, a hardware issue
with the commit log(s), a network issue. If the problem is constant it is trivial to come
to an understanding of the cause. However in order to diagnose intermittent problems we often
don't know where, or when, to begin looking. We need some sort of timestamped indication of
the problem. Although ZooKeeper is not a datastore, it does persist data, and can suffer intermittent
performance degradation, and should consider implementing a 'slow query' log, a feature very
common to services which persist information on behalf of clients which may be sensitive to
latency while waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally processing the
request, that the current time minus arrival time of the request is beyond a configured threshold.

> Look at the HBase {{responseTooSlow}} feature for inspiration. 

This message was sent by Atlassian JIRA

View raw message