Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 70203190FA for ; Mon, 21 Mar 2016 20:44:26 +0000 (UTC) Received: (qmail 66912 invoked by uid 500); 21 Mar 2016 20:44:25 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 66833 invoked by uid 500); 21 Mar 2016 20:44:25 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 66745 invoked by uid 99); 21 Mar 2016 20:44:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Mar 2016 20:44:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8E4CB2C1F6A for ; Mon, 21 Mar 2016 20:44:25 +0000 (UTC) Date: Mon, 21 Mar 2016 20:44:25 +0000 (UTC) From: "Mingliang Liu (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205086#comment-15205086 ] Mingliang Liu commented on HDFS-10175: -------------------------------------- Thanks for your comment, [~andrew.wang]. I was aware of the thread local statistics data structure, and was in favor of following the same approach. The new operation map is still per-thread. The ConcurrentHashMap was used because when aggregating, we have to make sure the map should not be modified. It's functionality is similar to the "volatile" keyword for other primitive statistic data. Anyway, I will revise the code and will update the patch if ConcurrentHashMap turns out unnecessary, for the sake of performance. Before that, the next patch will firstly resolve the conflicts from trunk because of [HDFS-9579]. > add per-operation stats to FileSystem.Statistics > ------------------------------------------------ > > Key: HDFS-10175 > URL: https://issues.apache.org/jira/browse/HDFS-10175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client > Reporter: Ram Venkatesh > Assignee: Mingliang Liu > Attachments: HDFS-10175.000.patch > > > Currently FileSystem.Statistics exposes the following statistics: > BytesRead > BytesWritten > ReadOps > LargeReadOps > WriteOps > These are in-turn exposed as job counters by MapReduce and other frameworks. There is logic within DfsClient to map operations to these counters that can be confusing, for instance, mkdirs counts as a writeOp. > Proposed enhancement: > Add a statistic for each DfsClient operation including create, append, createSymlink, delete, exists, mkdirs, rename and expose them as new properties on the Statistics object. The operation-specific counters can be used for analyzing the load imposed by a particular job on HDFS. > For example, we can use them to identify jobs that end up creating a large number of files. > Once this information is available in the Statistics object, the app frameworks like MapReduce can expose them as additional counters to be aggregated and recorded as part of job summary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)