hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: One file per mapper
Date Wed, 06 Jul 2011 14:46:32 GMT
On Tue, Jul 5, 2011 at 5:28 PM, Jim Falgout <jim.falgout@pervasive.com>wrote:

> I've done this before by placing the name of each file to process into a
> single file (newline separated) and using the NLineInputFormat class as the
> input format. Run your job with the single file with all of the file names
> to process as the input. Each mapper will then be handed one line (this is
> tunable) from the single input file. The line will contain the name of the
> file to process.
> You can also write your own InputFormat class that creates a split for each
> file.
> Both of these options have scalability issues which begs the question: why
> one file per mapper?
> -----Original Message-----
> From: Govind Kothari [mailto:govindkothari@gmail.com]
> Sent: Tuesday, July 05, 2011 3:04 PM
> To: common-user@hadoop.apache.org
> Subject: One file per mapper
> Hi,
> I am new to hadoop. I have a set of files and I want to assign each file to
> a mapper. Also in mapper there should be a way to know the complete path of
> the file. Can you please tell me how to do that ?
> Thanks,
> Govind
> --
> Govind Kothari
> Graduate Student
> Dept. of Computer Science
> University of Maryland College Park
> <---Seek Excellence, Success will Follow --->
You can also do this with MultipleInputs and MultipleOutputs classes. Each
source file can have a different mapper.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message