hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Balachandar R.A." <balachandar...@gmail.com>
Subject Re: one new bie question
Date Thu, 09 May 2013 08:12:08 GMT
Wow,

Thats exactly what I want.

Thanks a lot

Balson
On 9 May 2013 13:16, "Ted Xu" <txu@gopivotal.com> wrote:

> Hi Balson,
>
> Have you tried NLineInputFormat<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html>?
> You can find example of NLineInputFormat here: http://goo.gl/aVzDr.
>
>
> On Thu, May 9, 2013 at 2:53 PM, Balachandar R.A. <balachandar.ra@gmail.com
> > wrote:
>
>>
>> Hello
>>
>> I would like to see the possibility of using map reduce framework for my
>> following problem.
>>
>> I have a set of huge files. I would like to execute a binary over every
>> input files. The binary needs to operate over the whole file and hence it
>> is not possible to split the file in chunks. Let’s assume that I have six
>> such files and have their names in a single text file. I need to write
>> hadoop code to take this single file as input and every line in it should
>> go to one map task. The map tasks shall execute the binary on this file and
>> the file can be located in hdfs. No reduce tasks is needed and no output
>> shall be emitted from the map tasks as well. The binary take care of
>> creating output file in the specified location.
>> Is there a way to tell hadoop to feed single line to a map task? I came
>> across few examples wherein a set of files has been given and looks like
>> the framework try to split the file, reads every line in the split,
>> generates key/value pairs and send this pairs to single map task. In my
>> situation, I want only one key value pair should be generated for one line
>> and it should be given to a single map task. Thats it?
>>
>> For ex. Assume that this is my file <input.txt>
>>
>> myFirstInput.vlc
>> mySecondInput.vlc
>> myThirdInput.vlc
>>
>> Now, first map task should get a pair <1, myFirstInput.vlc>, the second
>> gets a pair <2, mySecondInput.vlc> and so on.
>>
>> Can someone throw some light in to this problem? For me, it looks
>> straightforward but could not find any pointers in the web.
>>
>>
>>
>>
>>
>>
>>
>> With thanks and regards
>> Balson
>>
>>
>>
>>
>
>
>
> --
> Regards,
> Ted Xu
>

Mime
View raw message