hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1338) Improve TestDFSIO
Date Wed, 11 Aug 2010 04:51:17 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897137#action_12897137
] 

Arun C Murthy commented on HDFS-1338:
-------------------------------------

{quote}
DFSIO benchmark is designed to measure HDFS data transfer performance only.
TestDFSIO is not intended to benchmark typical MR usage pattern.
TestDFSIO intentionally avoids any overhead or optimizations induced by MR framework.
{quote}

A benchmark should be something we use to reason about a particular aspect of the framework,
in this case performance.

The point I'm trying to make is that TestDFSIO, as it stands, is formulated in a way which
is impossible to reason about its results. I don't particularly care how we implement it and
I agree it shouldn't be constrained by the vagaries of the Map-Reduce scheduler. However,
we do need a benchmark which does node-local, rack-local, off-switch reads and writes in a
predictable manner so that when we notice a difference in the results of the benchmark we
are in position to reason about it.

> Improve TestDFSIO
> -----------------
>
>                 Key: HDFS-1338
>                 URL: https://issues.apache.org/jira/browse/HDFS-1338
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Arun C Murthy
>
> Currently the read test in TestDFSIO benchmark just opens a large side file and measures
the read performance. The MR scheduler has no opportunity to do *any* optimization for the
TestDFSIO MR application. The side-effect of this is that it is *very* hard to do any meaningful
analysis of the results of the benchmark i.e. to check if node-local or rack-local or off-switch
read performance improved/degraded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message