hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From abc xyz <fabc_xyz...@yahoo.com>
Subject Total order partitioner [Modified]
Date Mon, 09 Aug 2010 15:30:11 GMT

1) The input splits are sampled when we use the total order partitioner provided 
in Hadoop 0.19. I want to 

know how and when this sampling is done. Is this sampling done before Master 
allocates tasks to the nodes since the sampling file has to be added to 
distributed cache as well. If it is so, is this sampling carried out at master 
node? Then master has to access the input splits for getting the samples?

2) Also, does total order partitioner allow such ranges where a key can  belong 
to more than one ranges? I mean something like this, A, C, D, D,  H, Y where 
keys from A and C sent to one partition, Keys from C to D  sent to 2nd 
partition, Keys with value D can be sent randomly either to  2nd or 3rd 
partition, and so on. or are these ranges mutually exclusive?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message