hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ognen Duzlevski <og...@nengoiksvelzud.com>
Subject HDFS question
Date Tue, 28 Jan 2014 16:42:59 GMT

I am new to Hadoop and HDFS so maybe I am not understanding things properly
but I have the following issue:

I have set up a name node and a bunch of data nodes for HDFS. Each node
contributes 1.6TB of space so the total space shown on the hdfs web front
end is about 25TB. I have set the replication to be 3.

I am downloading large files on a single data node from Amazon's S3 using
the -distcp command - like this:

 hadoop --config /etc/hadoop distcp

Where is the Hadoop Name node.

All I am getting is that the machine I am running these commands on (one of
the data nodes) is getting all the files - they do not seem to be
"spreading" around the HDFS cluster.

Is this expected? Did I completely misunderstand the point of a parallel
DISTRIBUTED file system? :)


View raw message