hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <anilg...@buffalo.edu>
Subject Re: Suggestion for InputSplit and InputFormat - Split every line.
Date Fri, 16 Mar 2012 01:38:56 GMT
Have a look at NLineInputFormat class in Hadoop. It is build to split the
input on the basis of number of lines.

On Thu, Mar 15, 2012 at 6:13 PM, Deepak Nettem <deepaknettem@gmail.com>wrote:

> Hi,
>
> I have this use case - I need to spawn as many mappers as the number of
> lines in a file in HDFS. This file isn't big (only 10-50 lines). Actually
> each line represents the path of another data source that the Mappers will
> work on. So each mapper will read 1 line, (the map() method will need to be
> called only once), and work on the data source.
>
> What's the best way to construct InputSplit, InputFormat and RecordReader
> to achieve this? I would appreciate any example code :)
>
> Best,
> Deepak
>



-- 
Thanks & Regards,
Anil Gupta

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message