hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Burkhardt, Paul" <Paul_Burkha...@sra.com>
Subject RE: [jira] Created: (MAPREDUCE-1973) Optimize input split creation
Date Fri, 30 Jul 2010 21:33:59 GMT
I ran "ant test" on CDH3B2 (hadoop-0.20.2+320) and it fails prior to and
after patching, so I don't think it is the patch. See the build/test
directory and review the TEST output files. In my environment, the
TestFileAppend4 test fails.


-----Original Message-----
From: Ted Yu [mailto:yuzhihong@gmail.com] 
Sent: Wednesday, July 28, 2010 7:06 PM
To: mapreduce-dev@hadoop.apache.org
Subject: Re: [jira] Created: (MAPREDUCE-1973) Optimize input split

I applied the patch on cdh3b2.
ant test gave me:
   [junit] Running org.apache.hadoop.mrunit.types.TestPair
    [junit] Tests run: 20, Failures: 0, Errors: 0, Time elapsed: 0.041

/Users/tyu/hadoop-0.20.2+320/build.xml:839: Tests failed!

How can I find out which tests actually failed ?


On Tue, Jul 27, 2010 at 4:15 PM, Paul Burkhardt (JIRA)

> Optimize input split creation
> -----------------------------
>                 Key: MAPREDUCE-1973
>                 URL:
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.2, 0.20.1
>         Environment: Intel Nehalem cluster running Red Hat.
>            Reporter: Paul Burkhardt
>            Priority: Minor
> The input split returns the locations that host the file blocks in the
> split. The locations are determined by the getBlockLocations method of
> filesystem client which requires a remote connection to the filesystem
> HDFS). The remote connection is made for each file in the entire input
> split. For jobs with many input files the network connections dominate
> cost of writing the input split file.
> A job requests a listing of the input files from the remote filesystem
> creates a FileStatus object as a handle for each file in the listing.
> FileStatus object can be imbued with the necessary host information on
> remote end and passed to the client-side in the bulk return of the
> request. A getHosts method of the FileStatus would then return the
> for the blocks comprising that file and eliminate the need for another
> to the remote filesystem.
> The INodeFile maintains the blocks for a file and is an obvious choice
> be the originator for the locations of that file. It is also available
> the FSDirectory which first creates the listing of FileStatus objects.
> propose that the block locations be generated by the INodeFile to
> instantiate the FileStatus object during the getListing request.
> Our tests demonstrated a factor of 2000 speedup for approximately
> input files.
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.

View raw message