hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: [jira] Created: (MAPREDUCE-1973) Optimize input split creation
Date Wed, 28 Jul 2010 23:05:52 GMT
I applied the patch on cdh3b2.
ant test gave me:
   [junit] Running org.apache.hadoop.mrunit.types.TestPair
    [junit] Tests run: 20, Failures: 0, Errors: 0, Time elapsed: 0.041 sec

/Users/tyu/hadoop-0.20.2+320/build.xml:839: Tests failed!

How can I find out which tests actually failed ?


On Tue, Jul 27, 2010 at 4:15 PM, Paul Burkhardt (JIRA) <jira@apache.org>wrote:

> Optimize input split creation
> -----------------------------
>                 Key: MAPREDUCE-1973
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1973
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.2, 0.20.1
>         Environment: Intel Nehalem cluster running Red Hat.
>            Reporter: Paul Burkhardt
>            Priority: Minor
> The input split returns the locations that host the file blocks in the
> split. The locations are determined by the getBlockLocations method of the
> filesystem client which requires a remote connection to the filesystem (i.e.
> HDFS). The remote connection is made for each file in the entire input
> split. For jobs with many input files the network connections dominate the
> cost of writing the input split file.
> A job requests a listing of the input files from the remote filesystem and
> creates a FileStatus object as a handle for each file in the listing. The
> FileStatus object can be imbued with the necessary host information on the
> remote end and passed to the client-side in the bulk return of the listing
> request. A getHosts method of the FileStatus would then return the locations
> for the blocks comprising that file and eliminate the need for another trip
> to the remote filesystem.
> The INodeFile maintains the blocks for a file and is an obvious choice to
> be the originator for the locations of that file. It is also available to
> the FSDirectory which first creates the listing of FileStatus objects. We
> propose that the block locations be generated by the INodeFile to
> instantiate the FileStatus object during the getListing request.
> Our tests demonstrated a factor of 2000 speedup for approximately 60,000
> input files.
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message