hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bai Shen <baishen.li...@gmail.com>
Subject Task location determination
Date Wed, 04 Jan 2012 14:34:17 GMT
I have a test Hadoop cluster set up using Cloudera.  It consists of the
Name Node and three Data Nodes.  When I submit jobs, they end up piling up
on one node instead of round robining through the different nodes.

I understand that Hadoop tries to run the job where the data is located,
but with only three data nodes and a replication factor of 3, wouldn't that
mean that the same data is on every single machine?  Why would it not
spread the tasking out over all of the machines instead of clumping up on
one, leaving the others idle?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message