hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raj V <rajv...@yahoo.com>
Subject Re: hdfs space problem.
Date Thu, 05 Aug 2010 18:28:39 GMT
Thank you. I realized that I was running the datanode on the namenode and 
stopped it, but did not know that the first copy went to the local node.


From: Dmitry Pushkarev <umka@stanford.edu>
To: common-user@hadoop.apache.org
Sent: Thu, August 5, 2010 11:02:08 AM
Subject: RE: hdfs space problem.

when you copy files and have a local datanode - first copy will end up
Just stop datanode at the node from which you copy files, and they will end
up on random nodes.

Also don't run datanode at the same machine as namenode.

-----Original Message-----
From: Raj V [mailto:rajvish@yahoo.com] 
Sent: Thursday, August 05, 2010 8:33 AM
To: common-user@hadoop.apache.org
Subject: hdfs space problem.

I run a 512 node hadoop cluster. Yesterday I moved 30Gb of compressed data
a NFS mounted partition by running  on the namenode

hadoop fs -copyFromLocal  /mnt/data/data1 /mnt/data/data2 mnt/data/data3 

When the job completed the local disk on the namenode was 40% full ( Most of
used by the dfs dierctories) while the others had 1% disk utilization.

Just to see if there was an issue, I deleted the hdfs:/data directory and 
restarted the move from a datanode. 

Once again the disk space on that data node was substantially over utilized.

I would have assumed that the disk space would be more or less uniformly 
consumed on all the data nodes.

Is there a reason why one disk would be over utilized? 

Do I have to run balancer everytime I copy data?

Am I missing something?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message