hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-9579) Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level
Date Sat, 19 Dec 2015 03:40:46 GMT

     [ https://issues.apache.org/jira/browse/HDFS-9579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ming Ma updated HDFS-9579:
    Attachment: HDFS-9579.patch

The draft patch has the following changes.

* Fix {{NetworkTopology}}'s {{getDistance}} to do node comparison not just based on object
reference; instead the distance between two {{NodeBase}}s should be zero as long as they have
the same network path.
* Add new metrics to {{FileSystem.StatisticsData}} to track bytes read for each distance value.
* Have {{DFSInputStream}} update the new metrics.

> Provide bytes-read-by-network-distance metrics at FileSystem.Statistics level
> -----------------------------------------------------------------------------
>                 Key: HDFS-9579
>                 URL: https://issues.apache.org/jira/browse/HDFS-9579
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ming Ma
>         Attachments: HDFS-9579.patch
> For cross DC distcp or other applications, it becomes useful to have insight as to the
traffic volume for each network distance to distinguish cross-DC traffic, local-DC-remote-rack,
> FileSystem's existing {{bytesRead}} metrics tracks all the bytes read. To provide additional
metrics for each network distance, we can add additional metrics to FileSystem level and have
{{DFSInputStream}} update the value based on the network distance between client and the datanode.
> {{DFSClient}} will resolve client machine's network location as part of its initialization.
It doesn't need to resolve datanode's network location for each read as {{DatanodeInfo}} already
has the info.
> There are existing HDFS specific metrics such as {{ReadStatistics}} and {{DFSHedgedReadMetrics}}.
But these metrics are only accessible via {{DFSClient}} or {{DFSInputStream}}. Not something
that application framework such as MR and Tez can get to. That is the benefit of storing these
new metrics in FileSystem.Statistics.
> This jira only includes metrics generation by HDFS. The consumption of these metrics
at MR and Tez will be tracked by separated jiras.
> We can add similar metrics for HDFS write scenario later if it is necessary.

This message was sent by Atlassian JIRA

View raw message