hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anfernee Xu <anfernee...@gmail.com>
Subject Re: how to implement post-mapper processing
Date Wed, 25 Aug 2010 15:59:51 GMT
Yes, it works if the node only has a single split, if it has multiple,
that's still a problem since not all data has been processed.


On Wed, Aug 25, 2010 at 11:08 PM, David Rosenstrauch <darose@darose.net>wrote:

> On 08/25/2010 10:36 AM, Anfernee Xu wrote:
>
>> Thanks all for your help.
>>
>> The challenge is that suppose I have 4 datanodes in cluster, but for a
>> given
>> input, I have 2 splits, therefore only 2 nodes out of 4 will run M/R job,
>> say nodeA and nodeB, after the job completes, the data from input has been
>> stored in datastore on nodeA and nodeB, nodeC and nodeD are intact at this
>> moment, for now I need to run a post-processing on nodeA and nodeB to get
>> my
>> data ready, originally I think I can have another M/R job also with 2
>> splits, but I cannot tell which node will be selected to run these splits,
>> I
>> expected the same nodes will be selected.
>>
>> Anfernee
>>
>
> Well then you could put your post-processing in Mapper.cleanup.
>
>
> http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/Mapper.html#cleanup%28org.apache.hadoop.mapreduce.Mapper.Context%29
>
> DR
>



-- 
--Anfernee

Mime
View raw message