hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma" <jssa...@facebook.com>
Subject RE: Does Hadoop Honor Reserved Space?
Date Thu, 06 Mar 2008 19:56:35 GMT
but intermediate data is stored in a different directory from dfs/data (something like mapred/local
by default i think).

what version are u running? 


-----Original Message-----
From: Ashwinder Ahluwalia on behalf of ahluwalia5@yahoo.com
Sent: Thu 3/6/2008 10:14 AM
To: core-user@hadoop.apache.org
Subject: RE: Does Hadoop Honor Reserved Space?
 
I've run into a similar issue in the past. From what I understand, this
parameter only controls the HDFS space usage. However, the intermediate data in
the map reduce job is stored on the local file system (not HDFS) and is not
subject to this configuration.

In the past I have used mapred.local.dir.minspacekill and
mapred.local.dir.minspacestart to control the amount of space that is allowable
for use by this temporary data. 

Not sure if that is the best approach though, so I'd love to hear what other
people have done. In your case, you have a map-red job that will consume too
much space (without setting a limit, you didn't have enough disk capacity for
the job), so looking at mapred.output.compress and mapred.compress.map.output
might be useful to decrease the job's disk requirements.

--Ash

-----Original Message-----
From: Jimmy Wan [mailto:jimmy@indeed.com] 
Sent: Thursday, March 06, 2008 9:56 AM
To: core-user@hadoop.apache.org
Subject: Does Hadoop Honor Reserved Space?

I've got 2 datanodes setup with the following configuration parameter:
	<property>
	  <name>dfs.datanode.du.reserved</name>
	  <value>429496729600</value>
	  <description>Reserved space in bytes per volume. Always leave this
much  
space free for non dfs use.
	  </description>
	</property>

Both are housed on 800GB volumes, so I thought this would keep about half  
the volume free for non-HDFS usage.

After some long running jobs last night, both disk volumes were completely  
filled. The bulk of the data was in:
${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data

This is running as the user hadoop.

Am I interpretting these parameters incorrectly?

I noticed this issue, but it is marked as closed:  
http://issues.apache.org/jira/browse/HADOOP-2549

-- 
Jimmy



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message