hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: running hadoop on heterogeneous hardware
Date Thu, 22 Jan 2009 10:17:00 GMT
Bill Au wrote:
> Is hadoop designed to run on homogeneous hardware only, or does it work just
> as well on heterogeneous hardware as well?  If the datanodes have different
> disk capacities, does HDFS still spread the data blocks equally amount all
> the datanodes, or will the datanodes with high disk capacity end up storing
> more data blocks?  Similarily, if the tasktrackres have different numbers of
> CPUs, is there a way to configure hadoop to run more tasks on those
> tasktrackers that have more CPUs?  Is that simply a matter of setting
> mapred.tasktracker.map.tasks.maximum and
> mapred.tasktracker.reduce.tasks.maximum differently on the tasktrackers?
> Bill

Life is simpler on homogenous boxes; by setting the maximum tasks 
differently for the different machines, you do limit the amount of work 
that gets pushed out to those boxes. More troublesome is slower 
CPUs/HDDs, they arent picked up directly, though speculative work can 
handle some of this

One interesting bit of research would be something adaptive; something 
to monitor throughput and tune those values based on performance; that 
would detect variations in a cluster and work with with it, rather than 
requiring you  to know the capabilities of every machine.


View raw message