hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Null Ecksor <nulleck...@gmail.com>
Subject copying file into hdfs
Date Sat, 10 Apr 2010 18:03:02 GMT

Im mike,
I am a new user of Hadoop. currently, I have a cluster of 8 machines and a
file of size 2 gigs.
When I load it into hdfs using command
hadoop dfs -put /a.dat /data
It actually loads it on all data nodes. dfsadmin -report shows hdfs usage to
16 gigs. And it is taking 2 hours to load that data file.

with 1 node - my mapreduce operation on this data took 150 seconds.

So when I used my mapred operation on this cluster.. it is taking 220
seconds for same file.

Can some one please tell me How to distribute this file over 8 nodes - so
that each of them will have roughly 300 mbs of file chunk and the mapreduce
operation that I have wrote to work in parallel? Isn't hadoop cluster
supposed to be working in parallel?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message