hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gang Luo <lgpub...@yahoo.com.cn>
Subject Re: Total order partitioner [Modified]
Date Mon, 09 Aug 2010 16:46:54 GMT
the sampling is done at the master node by accessing the splits before the job 
is submitted. The partitioner, by default, should only sent one key to one 
partition exclusively, unless you modify it.

-Gang




----- 原始邮件 ----
发件人: abc xyz <fabc_xyz111@yahoo.com>
收件人: common-user@hadoop.apache.org
发送日期: 2010/8/9 (周一) 11:30:11 上午
主   题: Total order partitioner [Modified]


1) The input splits are sampled when we use the total order partitioner provided 

in Hadoop 0.19. I want to 

know how and when this sampling is done. Is this sampling done before Master 
allocates tasks to the nodes since the sampling file has to be added to 
distributed cache as well. If it is so, is this sampling carried out at master 
node? Then master has to access the input splits for getting the samples?

2) Also, does total order partitioner allow such ranges where a key can  belong 
to more than one ranges? I mean something like this, A, C, D, D,  H, Y where 
keys from A and C sent to one partition, Keys from C to D  sent to 2nd 
partition, Keys with value D can be sent randomly either to  2nd or 3rd 
partition, and so on. or are these ranges mutually exclusive?


      

Mime
View raw message