hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Bui <julian...@gmail.com>
Subject basic question about rack awareness and computation migration
Date Tue, 05 Mar 2013 11:49:24 GMT
Hi hadoop users,

I'm trying to find out if computation migration is something the developer
needs to worry about or if it's supposed to be hidden.

I would like to use hadoop to take in a list of image paths in the hdfs and
then have each task compress these large, raw images into something much
smaller - say jpeg  files.

Input: list of paths
Output: compressed jpeg

Since I don't really need a reduce task (I'm more using hadoop for its
reliability and orchestration aspects), my mapper ought to just take the
list of image paths and then work on them.  As I understand it, each image
will likely be on multiple data nodes.

My question is how will each mapper task "migrate the computation" to the
data nodes?  I recall reading that the namenode is supposed to deal with
this.  Is it hidden from the developer?  Or as the developer, do I need to
discover where the data lies and then migrate the task to that node?  Since
my input is just a list of paths, it seems like the namenode couldn't
really do this for me.

Another question: Where can I find out more about this?  I've looked up
"rack awareness" and "computation migration" but haven't really found much
code relating to either one - leading me to believe I'm not supposed to
have to write code to deal with this.

Anyway, could someone please help me out or set me straight on this?


View raw message