hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shi Yu <sh...@uchicago.edu>
Subject Prime number of reduces vs. linear hash function
Date Sun, 24 Oct 2010 02:00:46 GMT
There is a suggestion to set the number of reducers to a prime number 
closest to the number of nodes and number of mappers a prime number 
closest to several times the number of nodes in the cluster. But there 
is also saying that "There is no need for the number of reduces to be 
prime. The only thing it helps is if you are using the HashPartitioner 
and your key's hash function is too linear. In practice, you usually 
want to use 99% of your reduce capacity of the cluster."

Could anyone explain what is the theory behind the prime number and the 
hash function here?

Shi


Mime
View raw message