hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drake민영근 <drake....@nexr.com>
Subject Re: Prune out data to a specific reduce task
Date Mon, 16 Mar 2015 01:04:24 GMT
Hi,

If you write custom partitioner, just call them to confrim the key match
with which partition.

You can get the number of reduer from mapcontext.getNumReduceTasks(). then,
get reducer number from Partitioner.getPartition(key, value, numReduc).
Finally, just write wanted records to the reducers.

Caution: In this way, the parallelism of mapreduce programming model is
much broken. If you cut the records for Reducer 2, the task still up but
nothing in action.

Thanks.

Drake 민영근 Ph.D
kt NexR

On Fri, Mar 13, 2015 at 11:47 PM, xeonmailinglist-gmail <
xeonmailinglist@gmail.com> wrote:

>  Hi,
>
> The only obstacle is to know to which partition the map output would go.
> 1 ~ From the map method, how can I know to which partition the output go?
> 2 ~ Can I call getPartition(K key, V value, int numReduceTasks) from the
> map function?
>
> Thanks,
>
>
>
>
>
> On 13-03-2015 03:25, Naganarasimha G R (Naga) wrote:
>
> I think Drake's comment
> "In the map method, records would be ignored with no output.collect() or
> context.write()."
> is most valid way to do it as it will avoid further processing downstream
> and hence less resources would be consumed, as unwanted records are pruned
> at the source itself.
> Is there any obstacle from doing this in your map method ?
>
>  Regards,
> Naga
>  ------------------------------
> *From:* xeonmailinglist-gmail [xeonmailinglist@gmail.com]
> *Sent:* Thursday, March 12, 2015 22:17
> *To:* user@hadoop.apache.org
> *Subject:* Fwd: Re: Prune out data to a specific reduce task
>
>   If I use the partitioner, I must be able to tell map reduce to not
> execute values from a certain reduce tasks.
>
> The method public int getPartition(K key, V value, int numReduceTasks)
> must always return a partition. I can’t return -1. Thus, I don’ t know how
> to tell Mapreduce to not execute data from a partition. Any suggestion?
>
> ———— Forwarded Message ————
>
> Subject: Re: Prune out data to a specific reduce task
>
> Date: Thu, 12 Mar 2015 12:40:04 -0400
>
> From: Fei Hu hufei68@gmail.com <http://mailto:hufei68@gmail.com>
>
> Reply-To: user@hadoop.apache.org
>
> To: user@hadoop.apache.org
>
> Maybe you could use Partitioner.class to solve your problem.
>
> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <
> xeonmailinglist@gmail.com> wrote:
>
>  Hi,
>
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
> excludes data that will go to the reduce task 2. This means that, only
> reducer 1 will produce data, and the other one will be empty, or even it
> doesn't execute.
>
> How can I do this in MapReduce?
>
> <ExampleJobExecution.png>
>
>
> Thanks,
>
> --
> --
>
>    ​
>
>
> --
> --
>
>

Mime
View raw message