Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Mon, 21 Mar 2016 20:44:25 +0000 (UTC)
From: "Mingliang Liu (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12951000.1458162347000.9481.1458593065580@Atlassian.JIRA>
In-Reply-To: <JIRA.12951000.1458162347000@Atlassian.JIRA>
References: <JIRA.12951000.1458162347000@Atlassian.JIRA>
 <JIRA.12951000.1458162347692@arcas>
Subject: [jira] [Commented] (HDFS-10175) add per-operation stats to
 FileSystem.Statistics
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205086#comment-15205086 ] 

Mingliang Liu commented on HDFS-10175:
--------------------------------------

Thanks for your comment, [~andrew.wang]. I was aware of the thread local statistics data structure, and was in favor of following the same approach. The new operation map is still per-thread. The ConcurrentHashMap was used because when aggregating, we have to make sure the map should not be modified. It's functionality is similar to the "volatile" keyword for other primitive statistic data.

Anyway, I will revise the code and will update the patch if ConcurrentHashMap turns out unnecessary, for the sake of performance. Before that, the next patch will firstly resolve the conflicts from trunk because of [HDFS-9579].

> add per-operation stats to FileSystem.Statistics
> ------------------------------------------------
>
>                 Key: HDFS-10175
>                 URL: https://issues.apache.org/jira/browse/HDFS-10175
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>            Reporter: Ram Venkatesh
>            Assignee: Mingliang Liu
>         Attachments: HDFS-10175.000.patch
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. There is logic within DfsClient to map operations to these counters that can be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, createSymlink, delete, exists, mkdirs, rename and expose them as new properties on the Statistics object. The operation-specific counters can be used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large number of files.
> Once this information is available in the Statistics object, the app frameworks like MapReduce can expose them as additional counters to be aggregated and recorded as part of job summary.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)