hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Xu <...@gopivotal.com>
Subject Re: one new bie question
Date Thu, 09 May 2013 07:25:04 GMT
Hi Balson,

Have you tried NLineInputFormat<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html>?
You can find example of NLineInputFormat here: http://goo.gl/aVzDr.

On Thu, May 9, 2013 at 2:53 PM, Balachandar R.A.

> Hello
> I would like to see the possibility of using map reduce framework for my
> following problem.
> I have a set of huge files. I would like to execute a binary over every
> input files. The binary needs to operate over the whole file and hence it
> is not possible to split the file in chunks. Let’s assume that I have six
> such files and have their names in a single text file. I need to write
> hadoop code to take this single file as input and every line in it should
> go to one map task. The map tasks shall execute the binary on this file and
> the file can be located in hdfs. No reduce tasks is needed and no output
> shall be emitted from the map tasks as well. The binary take care of
> creating output file in the specified location.
> Is there a way to tell hadoop to feed single line to a map task? I came
> across few examples wherein a set of files has been given and looks like
> the framework try to split the file, reads every line in the split,
> generates key/value pairs and send this pairs to single map task. In my
> situation, I want only one key value pair should be generated for one line
> and it should be given to a single map task. Thats it?
> For ex. Assume that this is my file <input.txt>
> myFirstInput.vlc
> mySecondInput.vlc
> myThirdInput.vlc
> Now, first map task should get a pair <1, myFirstInput.vlc>, the second
> gets a pair <2, mySecondInput.vlc> and so on.
> Can someone throw some light in to this problem? For me, it looks
> straightforward but could not find any pointers in the web.
> With thanks and regards
> Balson

Ted Xu

View raw message