hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6729) Accurately compute the test execute time in DFSIO
Date Fri, 08 Jul 2016 07:24:11 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367340#comment-15367340
] 

ASF GitHub Bot commented on MAPREDUCE-6729:
-------------------------------------------

GitHub user zhangminglei opened a pull request:

    https://github.com/apache/hadoop/pull/112

    MAPREDUCE-6729. Accurately compute the test execute time in DFSIO

    Update github-side PR to works well.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zhangminglei/hadoop trunk

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hadoop/pull/112.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #112
    
----
commit 2a295d0a1e80df0f9153b7600ff3f38b7c3faee5
Author: zhangminglei <zml13856086071@163.com>
Date:   2016-07-08T03:29:04Z

    MAPREDUCE-6729. Accurately compute the test execute time in DFSIO

----


> Accurately compute the test execute time in DFSIO
> -------------------------------------------------
>
>                 Key: MAPREDUCE-6729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6729
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: benchmarks, performance, test
>    Affects Versions: 2.9.0
>            Reporter: mingleizhang
>            Assignee: mingleizhang
>            Priority: Minor
>              Labels: performance, test
>         Attachments: MAPREDUCE-6729.001.patch
>
>
> When doing DFSIO test as a distributed i/o benchmark tool. Then especially writes plenty
of files to disk or read from, both can cause performance issue and imprecise value in a way.
The question is that existing practices needs to delete files when before running a job and
that will cause extra time consumption and furthermore cause performance issue, statistical
time error and imprecise throughput while the files are lots of. So we need to replace or
improve this hack to prevent this from happening in the future.
> {code}
> public static void testWrite() throws Exception {
>     FileSystem fs = cluster.getFileSystem();
>     long tStart = System.currentTimeMillis();
>     bench.writeTest(fs); // this line of code will cause extra time consumption because
of fs.delete(*,*) by the writeTest method
>     long execTime = System.currentTimeMillis() - tStart;
>     bench.analyzeResult(fs, TestType.TEST_TYPE_WRITE, execTime);
>   }
> private void writeTest(FileSystem fs) throws IOException {
>   Path writeDir = getWriteDir(config);
>   fs.delete(getDataDir(config), true);
>   fs.delete(writeDir, true);    
>   runIOTest(WriteMapper.class, writeDir);
>   }
> {code} 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/TestDFSIO.java]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message