hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jamal sasha <jamalsha...@gmail.com>
Subject Re: modifying existing wordcount example
Date Thu, 17 Jan 2013 02:54:04 GMT
Hi,
 Thanks for giving your thoughts.
I was reading some libraries in hadoop.. and i feel like distributed cache
might help me.
but i picked up hadoop very recently (and along it java as well) and i am
not able to think of how to actually code :(


On Wed, Jan 16, 2013 at 6:13 PM, Chris Embree <cembree@gmail.com> wrote:

> Can you instead copy intput1 and input2 together?
>
> Or process both files on the second pass?
>
> Otherwise, you'll have to read in output file, load the values and start
> your map/red job.
>
> Probably someone else will have a better answer. :)
>
>
> On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha <jamalshasha@gmail.com>wrote:
>
>> Hi,
>>   In the wordcount example:
>> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
>>  Lets say I run the above example and save the the output.
>> But lets say that I have now a new input file. What I want to do is..
>> basically again do the wordcount but basically modifying the previous
>> counts.
>> For example..
>> sample_input1.txt  //foo bar foo bar bar bar
>> After first run:
>> 1) foo 2
>> 2) bar 4
>>
>> Save it in output1.txt
>>
>> Now sample_input2.txt //bar hello world
>> Now the result I am looking for is:
>> 1)foo 2
>> 2)bar 5
>> 3) hello 1
>> 4) world 1
>>
>> How do i achieve this in map reduce?
>>
>>
>

Mime
View raw message