hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hedges <hed...@formdata.biz>
Subject hdfs nfs mounts and balancing for local access
Date Thu, 15 Dec 2011 19:01:54 GMT

Hi.  I was thinking about how HDFS could be useful on a big
cluster running a traditional grid scheduler, in which users
run arbitrary open-source and commercial applications under
some license management system.  (Take movie graphics
rendering as an example.)  It's a high bar to set that the
vendors have to implement hdfs:// urls, so NFS access seems
like the workable solution.

My question is, when autofs mounts from HDFS via NFS, does
it pass off the connection to the node closest on the switch
to the machine asking for the data, as it would for a hadoop
mapreduce application?

Or, does all the file data have to pass through a central
chokepoint on the NFS interface?

In a movie rendering example, an app on one node might need
more disk than available per node, so if the cluster were
configured with HDFS/NFS, it could use blocks from the
distributed filesystem.  But if all the nodes were active
and had to read and write all the traffic through the
NameNode and back out to the nodes containing the blocks, it
seems like it would lose a lot of the distributed advantage,
and the NameNode or the switch could get saturated pretty
quick.

Thanks for the info.

Mark


Mime
View raw message