hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xeonmailinglist-gmail <xeonmailingl...@gmail.com>
Subject Re: Prune out data to a specific reduce task
Date Fri, 13 Mar 2015 14:47:34 GMT
Hi,

The only obstacle is to know to which partition the map output would go.
1 ~ From the map method, how can I know to which partition the output go?
2 ~ Can I call |getPartition(K key, V value, int numReduceTasks)| from 
the map function?

Thanks,




On 13-03-2015 03:25, Naganarasimha G R (Naga) wrote:
> I think Drake's comment
> "In the map method, records would be ignored with no output.collect() 
> or context.write()."
> is most valid way to do it as it will avoid further processing 
> downstream and hence less resources would be consumed, as unwanted 
> records are pruned at the source itself.
> Is there any obstacle from doing this in your map method ?
>
> Regards,
> Naga
> ------------------------------------------------------------------------
> *From:* xeonmailinglist-gmail [xeonmailinglist@gmail.com]
> *Sent:* Thursday, March 12, 2015 22:17
> *To:* user@hadoop.apache.org
> *Subject:* Fwd: Re: Prune out data to a specific reduce task
>
> If I use the partitioner, I must be able to tell map reduce to not 
> execute values from a certain reduce tasks.
>
> The method |public int getPartition(K key, V value, int 
> numReduceTasks)| must always return a partition. I can’t return -1. 
> Thus, I don’ t know how to tell Mapreduce to not execute data from a 
> partition. Any suggestion?
>
> ———— Forwarded Message ————
>
> Subject: Re: Prune out data to a specific reduce task
>
> Date: Thu, 12 Mar 2015 12:40:04 -0400
>
> From: Fei Hu hufei68@gmail.com <http://mailto:hufei68@gmail.com>
>
> Reply-To: user@hadoop.apache.org
>
> To: user@hadoop.apache.org
>
> Maybe you could use Partitioner.class to solve your problem.
>
>> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail 
>> <xeonmailinglist@gmail.com <mailto:xeonmailinglist@gmail.com>> wrote:
>>
>> Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want 
>> to excludes data that will go to the reduce task 2. This means that, 
>> only reducer 1 will produce data, and the other one will be empty, or 
>> even it doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> <ExampleJobExecution.png>
>>
>>
>> Thanks,
>>
>> -- 
>> --
> ​

-- 
--


Mime
View raw message