hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics
Date Mon, 25 Apr 2016 21:21:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257074#comment-15257074
] 

Colin Patrick McCabe commented on HDFS-10175:
---------------------------------------------

We already have three statistics interfaces:
1. FileSystem#Statistics
2. DFSInputStream#ReadStatistics
3. metrics2 etc.

#1 existed for a very long time and is tied into MR in the ways discussed above.  I didn't
create it, but I did implement the thread-local optimization, based on some performance issues
we were having.

I have to take the blame for adding #2, in HDFS-4698.  At the time, the main focus was on
ensuring we were doing short-circuit reads, which didn't really fit into #1.  And like you,
I felt that it was "very low-level stream behavior" that was decoupled from the rest of the
stats.

Of course #3 has been around a while, and is used more generally than just in our storage
code.

I understand your eagerness to get the s3 stats in, but I would rather not proliferate more
statistics interfaces if possible.  Once they're in, we really can't get rid of them, and
it becomes very confusing and clunky.

Are you guys free for a webex on Wednesday afternoon?  Maybe 12:30pm to 2pm?

> add per-operation stats to FileSystem.Statistics
> ------------------------------------------------
>
>                 Key: HDFS-10175
>                 URL: https://issues.apache.org/jira/browse/HDFS-10175
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>            Reporter: Ram Venkatesh
>            Assignee: Mingliang Liu
>         Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, HDFS-10175.002.patch,
HDFS-10175.003.patch, HDFS-10175.004.patch, HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. There is
logic within DfsClient to map operations to these counters that can be confusing, for instance,
mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, createSymlink,
delete, exists, mkdirs, rename and expose them as new properties on the Statistics object.
The operation-specific counters can be used for analyzing the load imposed by a particular
job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large number of
files.
> Once this information is available in the Statistics object, the app frameworks like
MapReduce can expose them as additional counters to be aggregated and recorded as part of
job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message