Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9CA96200CD0 for ; Tue, 25 Jul 2017 22:21:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9B4671677B0; Tue, 25 Jul 2017 20:21:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DDBBC1677B6 for ; Tue, 25 Jul 2017 22:21:05 +0200 (CEST) Received: (qmail 73373 invoked by uid 500); 25 Jul 2017 20:21:04 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 73354 invoked by uid 99); 25 Jul 2017 20:21:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Jul 2017 20:21:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 6C57DC186C for ; Tue, 25 Jul 2017 20:21:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id lmHoXs2zqP3Z for ; Tue, 25 Jul 2017 20:21:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 6362D5F479 for ; Tue, 25 Jul 2017 20:21:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 8DECFE0E40 for ; Tue, 25 Jul 2017 20:21:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id A8FB223F77 for ; Tue, 25 Jul 2017 20:21:00 +0000 (UTC) Date: Tue, 25 Jul 2017 20:21:00 +0000 (UTC) From: "Andrew Purtell (JIRA)" To: dev@zookeeper.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 25 Jul 2017 20:21:06 -0000 [ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100706#comment-16100706 ] Andrew Purtell commented on ZOOKEEPER-2770: ------------------------------------------- The originally proposed change is hardly complex. I don't understand that aspect of this discussion. Whether or not the metric is useful, on the other hand... ok. That is a matter of opinion. I think we'd like to know if any ZK op takes longer than a second to complete, and how often that might happen, and on what host it is happening. We have fleet of thousands of servers. We have tens of ZooKeeper installations, each on five servers. Hardware does funny things from time to time. We'd like to be proactive. > ZooKeeper slow operation log > ---------------------------- > > Key: ZOOKEEPER-2770 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770 > Project: ZooKeeper > Issue Type: Improvement > Reporter: Karan Mehta > Assignee: Karan Mehta > Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, ZOOKEEPER-2770.003.patch > > > ZooKeeper is a complex distributed application. There are many reasons why any given read or write operation may become slow: a software bug, a protocol problem, a hardware issue with the commit log(s), a network issue. If the problem is constant it is trivial to come to an understanding of the cause. However in order to diagnose intermittent problems we often don't know where, or when, to begin looking. We need some sort of timestamped indication of the problem. Although ZooKeeper is not a datastore, it does persist data, and can suffer intermittent performance degradation, and should consider implementing a 'slow query' log, a feature very common to services which persist information on behalf of clients which may be sensitive to latency while waiting for confirmation of successful persistence. > Log the client and request details if the server discovers, when finally processing the request, that the current time minus arrival time of the request is beyond a configured threshold. > Look at the HBase {{responseTooSlow}} feature for inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)