hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From maha <m...@umail.ucsb.edu>
Subject Re: Quick question
Date Sun, 20 Feb 2011 19:59:52 GMT
Actually the following solved my problem ... but I'm a little suspicious of the side effect
of doing the following instead of using my own InputSplit to be 5 lines.

 conf.setInputFormat(org.apache.hadoop.mapred.lib.NLineInputFormat.class); // # of maps =
# lines
 conf.setInt("mapred.line.input.format.linespermap", 5); //# of lines per mapper = 5

If you have any thought of whether the upper solution is worst that writing my own inputSplit
to be about 5 lines, let me know.

Thanks everyone !

Maha
	    
On Feb 20, 2011, at 11:47 AM, maha wrote:

> Hi again Jim and Ted,
> 
> I understood that each mapper will be getting a block of lines... but even thought I
had only 2 mappers for a 16 lines of input file and TextInputFormat is used. A map-function
is processed for each of those 16 lines!
> 
> I wanted a block of lines per map ... hence something like map1 has 8 lines and map2
has 8 lines. 
> 
> So first question: is there a difference between Mappers and maps ?
> 
> Second: Does that mean I need to write my own inputFormat to make the InputSplit equal
to multipleLines ???
> 
> Thank you,
> 
> Maha
> 
> 
> On Feb 18, 2011, at 11:55 AM, Jim Falgout wrote:
> 
>> That's right. The TextInputFormat handles situations where records cross split boundaries.
What your mapper will see is "whole" records. 
>> 
>> -----Original Message-----
>> From: maha [mailto:maha@umail.ucsb.edu] 
>> Sent: Friday, February 18, 2011 1:14 PM
>> To: common-user
>> Subject: Quick question
>> 
>> Hi all,
>> 
>> I want to check if the following statement is right:
>> 
>> If I use TextInputFormat to process a text file with 2000 lines (each ending with
\n) with 20 mappers. Then each map will have a sequence of COMPLETE LINES . 
>> 
>> In other words,  the input is not split byte-wise but by lines. 
>> 
>> Is that right?
>> 
>> 
>> Thank you,
>> Maha
>> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message