hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4698) provide client-side metrics for remote reads, local reads, and short-circuit reads
Date Wed, 24 Apr 2013 04:11:17 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640059#comment-13640059

Aaron T. Myers commented on HDFS-4698:

Patch looks pretty good to me, though I do believe the findbugs warning is legitimate.

A few little comments:

# It looks to me like the patch misses a place in DFSInputStream where it should be adding
to the statistics before closing a BlockReader. Currently the patch only adds the stats in
DFSInputStream#blockSeekTo, but I think they should also be added in DFSInputStream#close.
# Recommend you add a comment to DFSInputStream#getReadStatistics about how to use the API,
i.e. that the stats will only be up-to-date after closing the DFSInputStream.
# Recommend adding comments to DFSInputStream.ReadStatistics explaining the meaning of the
various fields, i.e. that SCR bytes will count for both SCR and "local bytes", that total
>= local >= SCR, that remote bytes read can be determined by total - local, etc.
# For that matter, you might want to add a getRemoteBytesRead method to DFSInputStream.ReadStatistics
to do the subtraction for the user.
# Any thoughts about how this new feature should interact with the existing FileSystem#Statistics
class? Valid answers include "not at all" and/or "this will be helpful as-is, we can think
about that later."
> provide client-side metrics for remote reads, local reads, and short-circuit reads
> ----------------------------------------------------------------------------------
>                 Key: HDFS-4698
>                 URL: https://issues.apache.org/jira/browse/HDFS-4698
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 2.0.3-alpha
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HDFS-4698.001.patch
> We should provide metrics to let clients know how many bytes of data they have read remotely,
versus locally or via short-circuit local reads.  This will allow clients to know how well
they're doing at bringing the computation to the data, which will be useful in evaluating
placement policies and cluster configurations.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message