hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From maha <m...@umail.ucsb.edu>
Subject Re: Quick question
Date Sun, 20 Feb 2011 19:47:19 GMT
Hi again Jim and Ted,

 I understood that each mapper will be getting a block of lines... but even thought I had
only 2 mappers for a 16 lines of input file and TextInputFormat is used. A map-function is
processed for each of those 16 lines!

I wanted a block of lines per map ... hence something like map1 has 8 lines and map2 has 8
lines. 

So first question: is there a difference between Mappers and maps ?

Second: Does that mean I need to write my own inputFormat to make the InputSplit equal to
multipleLines ???

Thank you,

Maha


On Feb 18, 2011, at 11:55 AM, Jim Falgout wrote:

> That's right. The TextInputFormat handles situations where records cross split boundaries.
What your mapper will see is "whole" records. 
> 
> -----Original Message-----
> From: maha [mailto:maha@umail.ucsb.edu] 
> Sent: Friday, February 18, 2011 1:14 PM
> To: common-user
> Subject: Quick question
> 
> Hi all,
> 
>  I want to check if the following statement is right:
> 
> If I use TextInputFormat to process a text file with 2000 lines (each ending with \n)
with 20 mappers. Then each map will have a sequence of COMPLETE LINES . 
> 
> In other words,  the input is not split byte-wise but by lines. 
> 
> Is that right?
> 
> 
> Thank you,
> Maha
> 


Mime
View raw message