hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 杨浩 <yangha...@gmail.com>
Subject Re: trying to understand HashPartitioner
Date Fri, 27 Mar 2015 01:46:01 GMT
It's not the number of the the reduce task, but the ID of the reduce task.
For definite <k2, v2>, it will only be dealed on one reduce task.

In MRv2, each reduce task has an ID, like 0、1、2、3、4. The result is the
reduce ID and the <k2, v2> will be processed on that reduce task

2015-03-19 7:27 GMT+08:00 Jianfeng (Jeff) Zhang <jzhang@hortonworks.com>:

>
>  You can take it similar as the HashMap of java. Use the hashCode of one
> object to distribute it into different bucket.
>
>
>
>  Best Regard,
> Jeff Zhang
>
>
>   From: xeonmailinglist-gmail <xeonmailinglist@gmail.com>
> Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
> Date: Wednesday, March 18, 2015 at 7:08 PM
> To: "user@hadoop.apache.org" <user@hadoop.apache.org>
> Subject: Re: trying to understand HashPartitioner
>
>  What tells with partition will run on which reduce task?
>
> On 18-03-2015 09:30, xeonmailinglist-gmail wrote:
>
>  Hi,
>
> I am trying to understand how HashPartitioner.java works. Thus, I ran a
> mapreduce job with 5 reducers and 5 input files. I thought that the output
> of getPartition(K2 key, V2 value, int numReduceTasks) was the number of
> reduce task that K2 and V2 will execute. Is this correct?
>  ​
>
> --
> --
>
>
> --
> --
>
>

Mime
View raw message