Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 65D0919C0A for ; Tue, 22 Mar 2016 10:11:26 +0000 (UTC) Received: (qmail 90877 invoked by uid 500); 22 Mar 2016 10:11:26 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 90825 invoked by uid 500); 22 Mar 2016 10:11:26 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 90751 invoked by uid 99); 22 Mar 2016 10:11:26 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Mar 2016 10:11:26 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id BD9FC2C1F62 for ; Tue, 22 Mar 2016 10:11:25 +0000 (UTC) Date: Tue, 22 Mar 2016 10:11:25 +0000 (UTC) From: "Steve Loughran (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-12949) Add HTrace to the s3a connector MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-12949?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D15= 206107#comment-15206107 ]=20 Steve Loughran commented on HADOOP-12949: ----------------------------------------- There's actually some metrics collection in openstack swift; look under {{o= rg.apache.hadoop.fs.swift.util.DurationStats}} ; they log primarily to stdo= ut, list min, max, (moving) arithmetic mean, stddev,, by HTTP verb. # It's pretty low cost to do this; even when hbase sampling is inactive, th= e stats for an FS can be collected. # The stats showed that rackspace UK throttles delete requests; the more fi= les in a directory I was cleaning up on teardown, the longer it took =E2=80= =94only now exponentially, rather than linearly. # I didn't hook the code up to the normal hadoop metrics; it's something I'= d as an option now, because it does become something you need to monitor no= w we are shifting to longer-lived applications. # I'd add more on causes of operations, specifically: open(), seek(), durat= ion of close(), delete() =E2=80=94things where the fact that object stores = are generally O(files*data) means they don't work as expected ... finding t= hat mismatch of expectations matters More and more object stores are coming in. While s3 is the main one, it'd b= e good to have the core stuff store neutral. The classes from hadoop-openst= ack can be moved if that helps; the per-verb stuff is useful at the deep le= vels, while htrace monitoring can track cost of specific actions. > Add HTrace to the s3a connector > ------------------------------- > > Key: HADOOP-12949 > URL: https://issues.apache.org/jira/browse/HADOOP-12949 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Madhawa Gunasekara > > Hi All,=20 > s3, GCS, WASB, and other cloud blob stores are becoming increasingly impo= rtant in Hadoop. But we don't have distributed tracing for these yet. It wo= uld be interesting to add distributed tracing here. It would enable collect= ing really interesting data like probability distributions of PUT and GET r= equests to s3 and their impact on MR jobs, etc. > I would like to implement this feature, Please shed some light on this=20 > Thanks, > Madhawa -- This message was sent by Atlassian JIRA (v6.3.4#6332)