hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-72) hadoop doesn't take advatage of distributed compiting in TestDFSIO
Date Sat, 11 Mar 2006 02:32:39 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-72?page=all ]

Konstantin Shvachko updated HADOOP-72:
--------------------------------------

    Attachment:     (was: TestDFSIO.java)

> hadoop doesn't take advatage of distributed compiting in TestDFSIO
> ------------------------------------------------------------------
>
>          Key: HADOOP-72
>          URL: http://issues.apache.org/jira/browse/HADOOP-72
>      Project: Hadoop
>         Type: Test
>   Components: dfs, fs, mapred
>  Environment: 200 node cluster
>     Reporter: Konstantin Shvachko
>  Attachments: TestDFSIO_results_200_node_cluster.log, TestDFSIO_results_sequential.log
>
> TestDFSIO runs N map jobs, each either writing to or reading from a separate file of
the same size, 
> and collects statistical information on its performance. 
> The reducer further calculates the overall statistics for all maps. 
> It outputs the following data:
> - read or write test
> - date and time the test finished   
> - number of files
> - total number of bytes processed
> - overall throughput in mb/sec
> - average IO rate in mb/sec per file
> __Results__
> I run 7 iterations of the test one after another on a cluster of ~200 nodes. 
> The file size is the same in all cases 320Mb. 
> The number of files tried is 1,2,4,8,16,32,64.
> The log file with statistics is attached.
> It looks like we don't have any distributed computing here at all.
> The total execution time increases proportionally to the total size of data both for
writes and reads.
> Another thing is that the io ratio for read is higher than the write rate just gradually.
> For comparison I attach time measuring for the same ios performed on the same cluster
but sequentially in a simple loop.
> This is the summary:
> Files	map/red time	sequential time
>  1		49			  34 
>  2		86			  69
>  4		158			131
>  8		299			266
> 16		569			532
> 32		1131
> 64		2218
> This doesn't look good, unless there is something wrong with my test (attached) or the
cluster settings.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message