hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Arietta <sarie...@virginia.edu>
Subject Forcing Many Map Nodes
Date Wed, 08 Jul 2009 20:08:35 GMT

I have a rather specific question. Hopefully someone can help me get to the
bottom of this.

I need to be able to run a piece of code on an arbitrary number of physical
nodes. My initial thought was that I could trick the Hadoop API into
executing code on these nodes by emitting splits via a FileInputFormat whose
hosts were set to different physical nodes. This would look something like:

nodes = getNodeList()
foreach node in nodes
create split with hosts = node.getHostName()

Now for my questions. The first is whether this will actually enforce my
goal? I know that Hadoop attempts to move code to where the data is, but is
this enough to trick the API or does it actually do something more clever
like inspect the file that is past along with the split? 

Second, I cannot even succeed in performing the above task because I cannot
seem to find a way to get a list of the current live data nodes. I have been
all through the API and I have found some methods that can access this
information, but I cannot access those methods. Specifically, I found
FSNamesystem (which I cannot access) and JspHelper which complains of a null
pointer reference when I attempt to call the default constructor on it. So,
the second question is does anyone know a way to get a list of the live data
nodes from within a map reduce program?

Thanks a lot for your help!

View this message in context: http://www.nabble.com/Forcing-Many-Map-Nodes-tp24398403p24398403.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

View raw message