hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From maha <m...@umail.ucsb.edu>
Subject Re: Quick Question: LineSplit or BlockSplit
Date Tue, 08 Feb 2011 05:20:17 GMT
Thanks Ted. Then I have to write my own InputFormat to read a block-of-lines per mapper.
 
 NLineInputFormat didn't work with me, any working example about it is appreciate it.

Thanks again,

Maha





On Feb 7, 2011, at 6:32 PM, Mark Kerzner wrote:

> Thanks!
> Mark
> 
> On Mon, Feb 7, 2011 at 8:28 PM, Ted Dunning <tdunning@maprtech.com> wrote:
> 
>> That is quite doable.  One way to do it is to make the max split size quite
>> small.
>> 
>> On Mon, Feb 7, 2011 at 6:14 PM, Mark Kerzner <markkerzner@gmail.com>
>> wrote:
>> 
>>> Ted,
>>> 
>>> I am also interested in this answer.
>>> 
>>> I put the name of a zip file on a line in an input file, and I want one
>>> mapper to read this line, and start working on it (since it now knows the
>>> path in HDFS). Are you saying it's not doable?
>>> 
>>> Thank you,
>>> Mark
>>> 
>>> On Mon, Feb 7, 2011 at 8:10 PM, Ted Dunning <tdunning@maprtech.com>
>> wrote:
>>> 
>>>> Option (1) isn't the way that things normally work.  Besides, mappers
>> are
>>>> called many times for each construction of a mapper.
>>>> 
>>>> On Mon, Feb 7, 2011 at 3:38 PM, maha <maha@umail.ucsb.edu> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I would appreciate it if you could give me your thoughts if there is
>>>>> affect on efficiency if:
>>>>> 
>>>>> 1) Mappers were per line in a document
>>>>> 
>>>>> or
>>>>> 
>>>>> 2) Mappers were per block of lines in a document.
>>>>> 
>>>>> 
>>>>> I know the obvious difference I can see is that (1) has more
>> mappers.
>>>> Does
>>>>> that mean (1) will be slower because of scheduling time ?
>>>>> 
>>>>> Thank you,
>>>>> Maha
>>>>> 
>>>> 
>>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message