hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Burkhardt, Paul" <Paul_Burkha...@sra.com>
Subject RE: [jira] Created: (MAPREDUCE-1973) Optimize input split creation
Date Fri, 30 Jul 2010 21:33:59 GMT
I ran "ant test" on CDH3B2 (hadoop-0.20.2+320) and it fails prior to and
after patching, so I don't think it is the patch. See the build/test
directory and review the TEST output files. In my environment, the
TestFileAppend4 test fails.

Paul

-----Original Message-----
From: Ted Yu [mailto:yuzhihong@gmail.com] 
Sent: Wednesday, July 28, 2010 7:06 PM
To: mapreduce-dev@hadoop.apache.org
Subject: Re: [jira] Created: (MAPREDUCE-1973) Optimize input split
creation

I applied the patch on cdh3b2.
ant test gave me:
   [junit] Running org.apache.hadoop.mrunit.types.TestPair
    [junit] Tests run: 20, Failures: 0, Errors: 0, Time elapsed: 0.041
sec

BUILD FAILED
/Users/tyu/hadoop-0.20.2+320/build.xml:839: Tests failed!

How can I find out which tests actually failed ?

Thanks

On Tue, Jul 27, 2010 at 4:15 PM, Paul Burkhardt (JIRA)
<jira@apache.org>wrote:

> Optimize input split creation
> -----------------------------
>
>                 Key: MAPREDUCE-1973
>                 URL:
https://issues.apache.org/jira/browse/MAPREDUCE-1973
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.2, 0.20.1
>         Environment: Intel Nehalem cluster running Red Hat.
>            Reporter: Paul Burkhardt
>            Priority: Minor
>
>
> The input split returns the locations that host the file blocks in the
> split. The locations are determined by the getBlockLocations method of
the
> filesystem client which requires a remote connection to the filesystem
(i.e.
> HDFS). The remote connection is made for each file in the entire input
> split. For jobs with many input files the network connections dominate
the
> cost of writing the input split file.
>
> A job requests a listing of the input files from the remote filesystem
and
> creates a FileStatus object as a handle for each file in the listing.
The
> FileStatus object can be imbued with the necessary host information on
the
> remote end and passed to the client-side in the bulk return of the
listing
> request. A getHosts method of the FileStatus would then return the
locations
> for the blocks comprising that file and eliminate the need for another
trip
> to the remote filesystem.
>
> The INodeFile maintains the blocks for a file and is an obvious choice
to
> be the originator for the locations of that file. It is also available
to
> the FSDirectory which first creates the listing of FileStatus objects.
We
> propose that the block locations be generated by the INodeFile to
> instantiate the FileStatus object during the getListing request.
>
> Our tests demonstrated a factor of 2000 speedup for approximately
60,000
> input files.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Mime
View raw message