hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naganarasimha G R (Naga)" <garlanaganarasi...@huawei.com>
Subject RE: Re: Prune out data to a specific reduce task
Date Fri, 13 Mar 2015 03:25:05 GMT
I think Drake's comment
"In the map method, records would be ignored with no output.collect() or context.write()."
is most valid way to do it as it will avoid further processing downstream and hence less resources
would be consumed, as unwanted records are pruned at the source itself.
Is there any obstacle from doing this in your map method ?

From: xeonmailinglist-gmail [xeonmailinglist@gmail.com]
Sent: Thursday, March 12, 2015 22:17
To: user@hadoop.apache.org
Subject: Fwd: Re: Prune out data to a specific reduce task

If I use the partitioner, I must be able to tell map reduce to not execute values from a certain
reduce tasks.

The method public int getPartition(K key, V value, int numReduceTasks) must always return
a partition. I can’t return -1. Thus, I don’ t know how to tell Mapreduce to not execute
data from a partition. Any suggestion?

———— Forwarded Message ————

Subject: Re: Prune out data to a specific reduce task

Date: Thu, 12 Mar 2015 12:40:04 -0400

From: Fei Hu hufei68@gmail.com<http://mailto:hufei68@gmail.com>

Reply-To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>

To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>

Maybe you could use Partitioner.class to solve your problem.

On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <xeonmailinglist@gmail.com<mailto:xeonmailinglist@gmail.com>>


I have this job that has 3 map tasks and 2 reduce tasks. But, I want to excludes data that
will go to the reduce task 2. This means that, only reducer 1 will produce data, and the other
one will be empty, or even it doesn't execute.

How can I do this in MapReduce?




View raw message