hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco Nicosia <ma...@escape.org>
Subject Re: Few Issues!!!
Date Fri, 12 Jun 2009 15:39:12 GMT
In case you haven't seen these pages, I recommend:
   * <http://hadoop.apache.org/core/docs/current/quickstart.html>
   * <http://hadoop.apache.org/core/docs/current/cluster_setup.html>

Here's some more background information:

Data doesn't flow through the NameNode, only metadata transactions
such as "file open," "new block," "where are the blocks of a file,"

Regardless of where you run the HDFS client, on a DataNode, on the
NameNode, or on any other machine, you must have the environment
variable HADOOP_CONF_DIR (or use the --config option) set to a
location which contains the hadoop configuration files (hadoop-env.sh,
hadoop-default.xml, core-site.xml, etc) so that the client can
contact the NameNode. This means you have to distribute your Hadoop
configuration files to any computer on which you intend to run the
HDFS client; otherwise the client will not know how to find the
NameNode. The NameNode will tell the client which DataNodes to
connect to in order to write data blocks.

So, the command you should run is:

hadoop dfs -put /home/hadoop/Desktop/test.java /user/sugandha

... no matter where you run the command. The HDFS client will use
the configuration setting fs.default.name to contact the NameNode,
which will register the fact that the client intends to create a
file, and tell the client where it can start writing the first
block. (Thus, you never need to give the HDFS client the address
of a DataNode.) As the client adds blocks, the NameNode will tell
it where to put those subsequent blocks as well.

Hope this helps!

-- Marco N. (Who is going away for the weekend, so make sure
follow-up questions to back to the core-user mailing list.)

PS - Remember that unless you're resource constrained, you usually
want to run both DataNode and TaskTracker processes on the same
machines. You want your Map-Reduce code to run on the same computers
that are storing the input and output data on HDFS.

Sugandha Naolekar (sugandha.n87@gmail.com) wrote: 
> I have a 7 node cluster.
> Now if ssh to NN, and type in-hadoop -put /home/hadoop/Desktop/test.java
> /user/hadoop ------> the file gets placed in HDFS and gets replicated
> automatically.
> Now if the same file is in one of the datanodes in the same location. And I
> want to place it in HDFS through NN, and not ssh'ing to that
> datanode-------> then what should the format of the command.
> I tried hadoop -put
> /user/hadoop-------> here, 30 Ip is the datanode.
> But it didn't work. Also, I want to make it work though JAVA code by using
> all thise API's. So will I have to invoke RPC clients and servers methods to
> resolve this??
> Also, If this complete structure is executed on a remote node that has no
> connections with hadoop, what kind of scenarios I will have to face?
> Thanks!
> -- 
> Regards!
> Sugandha
Marco E. Nicosia  |  http://www.escape.org/~marco/  |  marco@escape.org

View raw message