Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7CE62200D35 for ; Mon, 23 Oct 2017 18:32:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 7B6611609DF; Mon, 23 Oct 2017 16:32:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C18FB1609E0 for ; Mon, 23 Oct 2017 18:32:04 +0200 (CEST) Received: (qmail 88088 invoked by uid 500); 23 Oct 2017 16:32:03 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 87747 invoked by uid 99); 23 Oct 2017 16:32:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Oct 2017 16:32:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id C0EBA1A11BF for ; Mon, 23 Oct 2017 16:32:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 5dPyUi1lYYpC for ; Mon, 23 Oct 2017 16:32:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 2CB395FB4E for ; Mon, 23 Oct 2017 16:32:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 70EA4E0373 for ; Mon, 23 Oct 2017 16:32:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 35F6D21EE3 for ; Mon, 23 Oct 2017 16:32:00 +0000 (UTC) Date: Mon, 23 Oct 2017 16:32:00 +0000 (UTC) From: "Allen Wittenauer (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-14972) Histogram metrics types for latency, etc. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 23 Oct 2017 16:32:05 -0000 [ https://issues.apache.org/jira/browse/HADOOP-14972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215405#comment-16215405 ] Allen Wittenauer commented on HADOOP-14972: ------------------------------------------- It's not well documented or understood, but there are a lot of latency metrics that already have a level of binning using *.metrics.percentiles.intervals . > Histogram metrics types for latency, etc. > ----------------------------------------- > > Key: HADOOP-14972 > URL: https://issues.apache.org/jira/browse/HADOOP-14972 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Reporter: Sean Mackrory > Assignee: Sean Mackrory > > We'd like metrics to track latencies for various operations, such as latencies for various request types, etc. This may need to be done different from current metrics types that are just counters of type long, and it needs to be done intelligently as these measurements are very numerous, and are primarily interesting due to the outliers that are unpredictably far from normal. A few ideas on how we might implement something like this: > * An adaptive, sparse histogram type. I envision something configurable with a maximumum granularity and a maximum number of bins. Initially, datapoints are tallied in bins with the maximum granularity. As we reach the maximum number of bins, bins are merged in even / odd pairs. There's some complexity here, especially to make it perform well and allow safe concurrency, but I like the ability to configure reasonable limits and retain as much granularity as possible without knowing the exact shape of the data beforehand. > * LongMetrics named "read_latency_600ms", "read_latency_800ms" to represent bins. This was suggested to me by [~fabbri]. I initially did not like the idea of having either so many hard-coded bins for however many op types, but this could also be done dynamically (we just hard-code which measurements we take, and with what granularity to group them, e.g. read_latency, 200 ms). The resulting dataset could be sparse and dynamic to allow for extreme outliers, but the granularity is still pre-determined. > * We could also simply track a certain number of the highest latencies, and basic descriptive statistics like a running average, min / max, etc. Inherently more limited in what it can show us, but much simpler and might still provide some insight when analyzing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org