hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kay Ousterhout (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-11873) Include disk read/write time in FileSystem.Statistics
Date Thu, 23 Apr 2015 21:21:39 GMT
Kay Ousterhout created HADOOP-11873:

             Summary: Include disk read/write time in FileSystem.Statistics
                 Key: HADOOP-11873
                 URL: https://issues.apache.org/jira/browse/HADOOP-11873
             Project: Hadoop Common
          Issue Type: New Feature
          Components: metrics
            Reporter: Kay Ousterhout
            Priority: Minor

Measuring the time spent blocking on reading / writing data from / to disk is very useful
for debugging performance problems in applications that read data from Hadoop, and can give
much more information (e.g., to reflect disk contention) than just knowing the total amount
of data read.  I'd like to add something like "diskMillis" to FileSystem#Statistics to track

For data read from HDFS, this can be done with very low overhead by adding logging around
calls to RemoteBlockReader2.readNextPacket (because this reads larger chunks of data, the
time added by the instrumentation is very small relative to the time to actually read the
data).  For data written to HDFS, this can be done in DFSOutputStream.waitAndQueueCurrentPacket.

As far as I know, if you want this information today, it is only currently accessible by turning
on HTrace. It looks like HTrace can't be selectively enabled, so a user can't just turn on
the tracing on RemoteBlockReader2.readNextPacket for example, and instead needs to turn on
tracing everywhere (which then introduces a bunch of overhead -- so sampling is necessary).
 It would be hugely helpful to have native metrics for time reading / writing to disk that
are sufficiently low-overhead to be always on. (Please correct me if I'm wrong here about
what's possible today!)

This message was sent by Atlassian JIRA

View raw message