hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Wintle <tim.win...@teamrubber.com>
Subject Re: How can HDFS spread the data across the data nodes ?
Date Mon, 02 Feb 2009 00:55:56 GMT
I believe the standard advice is to write to the cluster from a computer
that is not running a hadoop daemon itself. Otherwise the data is
written locally (if you only have a replication of 1) to avoid
congestion on the network.


On Sun, 2009-02-01 at 15:09 -0800, kang_min82 wrote:
> Hi everyone, 
> I'm complete new to HDFS. Hope you guys can take a litte time to answer my
> question :).
> I have total 3 nodes in my cluster, one reserved for master (Namenode and
> JobTracker) and the two other nodes for slaves (Datanode).
> I tried to "copy" a file to HDFS with the following command:
> kang@vn:~/v-0.18.0$ hadoop-0.18.0/bin/hadoop fs -put test_file /
> If I start the command on master, HDFS spreads my file across all the name
> nodes. That should be fine ! But when I start the command on anydata node,
> HDFS doesn't spread the file, which means, the whole file is only written to
> this data node. Is it a bug ?
> My question is, how can HDFS manage something like that and which java class
> is involved ? 
> I read the script bin/hadoop and know that the class FsShell.java and the
> method copyFromLocal are involved. But I don't see and know how master
> manages and decides, on which data nodes can a file be written ?
> Any help is appreciated, thanks so much.
> Kang

View raw message