hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Balachandar R.A." <balachandar...@gmail.com>
Subject one new bie question
Date Thu, 09 May 2013 06:53:17 GMT

I would like to see the possibility of using map reduce framework for my
following problem.

I have a set of huge files. I would like to execute a binary over every
input files. The binary needs to operate over the whole file and hence it
is not possible to split the file in chunks. Let’s assume that I have six
such files and have their names in a single text file. I need to write
hadoop code to take this single file as input and every line in it should
go to one map task. The map tasks shall execute the binary on this file and
the file can be located in hdfs. No reduce tasks is needed and no output
shall be emitted from the map tasks as well. The binary take care of
creating output file in the specified location.
Is there a way to tell hadoop to feed single line to a map task? I came
across few examples wherein a set of files has been given and looks like
the framework try to split the file, reads every line in the split,
generates key/value pairs and send this pairs to single map task. In my
situation, I want only one key value pair should be generated for one line
and it should be given to a single map task. Thats it?

For ex. Assume that this is my file <input.txt>


Now, first map task should get a pair <1, myFirstInput.vlc>, the second
gets a pair <2, mySecondInput.vlc> and so on.

Can someone throw some light in to this problem? For me, it looks
straightforward but could not find any pointers in the web.

With thanks and regards

View raw message