hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian Holsman (Lists)" <li...@holsman.net>
Subject data locality in HDFS
Date Wed, 18 Jun 2008 06:18:15 GMT

I want to run a distributed cluster, where i have say 20 machines/slaves 
in 3 seperate data centers that belong to the same cluster.

Ideally I would like the other machines in the data center to be able to 
upload files (apache log files in this case) onto the local slaves and 
then have map/red tasks do their magic without having to move data until 
the reduce phase where the amount of data will be smaller.

does Hadoop have this functionality?
how do people handle multi-datacenter logging with hadoop in this case? 
do you just copy the data into a centeral location?


View raw message