hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: Prune out data to a specific reduce task
Date Mon, 16 Mar 2015 08:08:32 GMT
Hi,
Can you set only one reduce task? why did you want set up two reudce tasks
and only one work?


On Mon, Mar 16, 2015 at 9:04 AM, Drake민영근 <drake.min@nexr.com> wrote:

> Hi,
>
> If you write custom partitioner, just call them to confrim the key match
> with which partition.
>
> You can get the number of reduer from mapcontext.getNumReduceTasks().
> then, get reducer number from Partitioner.getPartition(key, value,
> numReduc). Finally, just write wanted records to the reducers.
>
> Caution: In this way, the parallelism of mapreduce programming model is
> much broken. If you cut the records for Reducer 2, the task still up but
> nothing in action.
>
> Thanks.
>
> Drake 민영근 Ph.D
> kt NexR
>
> On Fri, Mar 13, 2015 at 11:47 PM, xeonmailinglist-gmail <
> xeonmailinglist@gmail.com> wrote:
>
>>  Hi,
>>
>> The only obstacle is to know to which partition the map output would go.
>> 1 ~ From the map method, how can I know to which partition the output go?
>> 2 ~ Can I call getPartition(K key, V value, int numReduceTasks) from the
>> map function?
>>
>> Thanks,
>>
>>
>>
>>
>>
>> On 13-03-2015 03:25, Naganarasimha G R (Naga) wrote:
>>
>> I think Drake's comment
>> "In the map method, records would be ignored with no output.collect() or
>> context.write()."
>> is most valid way to do it as it will avoid further processing downstream
>> and hence less resources would be consumed, as unwanted records are pruned
>> at the source itself.
>> Is there any obstacle from doing this in your map method ?
>>
>>  Regards,
>> Naga
>>  ------------------------------
>> *From:* xeonmailinglist-gmail [xeonmailinglist@gmail.com]
>> *Sent:* Thursday, March 12, 2015 22:17
>> *To:* user@hadoop.apache.org
>> *Subject:* Fwd: Re: Prune out data to a specific reduce task
>>
>>   If I use the partitioner, I must be able to tell map reduce to not
>> execute values from a certain reduce tasks.
>>
>> The method public int getPartition(K key, V value, int numReduceTasks)
>> must always return a partition. I can’t return -1. Thus, I don’ t know how
>> to tell Mapreduce to not execute data from a partition. Any suggestion?
>>
>> ———— Forwarded Message ————
>>
>> Subject: Re: Prune out data to a specific reduce task
>>
>> Date: Thu, 12 Mar 2015 12:40:04 -0400
>>
>> From: Fei Hu hufei68@gmail.com <http://mailto:hufei68@gmail.com>
>>
>> Reply-To: user@hadoop.apache.org
>>
>> To: user@hadoop.apache.org
>>
>> Maybe you could use Partitioner.class to solve your problem.
>>
>> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <
>> xeonmailinglist@gmail.com> wrote:
>>
>>  Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
>> excludes data that will go to the reduce task 2. This means that, only
>> reducer 1 will produce data, and the other one will be empty, or even it
>> doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> <ExampleJobExecution.png>
>>
>>
>> Thanks,
>>
>> --
>> --
>>
>>    ​
>>
>>
>> --
>> --
>>
>>
>

Mime
View raw message