hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Noguchi <knogu...@yahoo-inc.com>
Subject Re: Question about DFS Reserved Space
Date Thu, 09 Jun 2011 17:24:39 GMT
My understanding of hdfs is limited but I believe it's
> DFS will use (total disk size - 10 GB)
and not 
> always leave 10 GB free?

Datanode simply does 'df' - reserve_space(10G) and *use* up to that amount.


On 6/9/11 10:13 AM, "Harsh J" <harsh@cloudera.com> wrote:

> Landy,
> On Thu, Jun 9, 2011 at 10:05 PM, Bible, Landy <landy-bible@utulsa.edu> wrote:
>> Hi all,
>> I'm planning a rather non-standard HDFS cluster.   The machines will be doing
>> more than just DFS, and each machine will have varying local storage
>> utilization outside of DFS.  If I use the "dfs.datanode.du.reserved" property
>> and reserve 10 GB,  Does that mean DFS will use (total disk size - 10 GB) or
>> that it will always leave 10 GB free?  Basically, is the disk usage outside
>> DFS (OS + other data) taken in to account?
> The latter (will leave 10 GB free). The whole disk is taken into
> account during space compute. So yes, even external data may
> influence.
>> As usage outside of DFS grows I'd like DFS to back off the disk, and migrate
>> blocks to other nodes.  If this isn't the current behavior, I could create a
>> script to look at disk usage every few hours and modify the reserved property
>> dynamically.  If the property is changed on a single datanode and it is
>> restarted, will the datanode then start moving blocks away?
> Why would you need to modify the reserve values once set to a
> comfortable value? The DN monitors the disk space by itself, so you
> don't have to.
> The DN will also not move away blocks if reserved limit is violated
> (due to you increasing it, say). However, it will begin to refuse any
> writes happening to it. You may require to run the Balancer in order
> to move blocks around and balance DNs though.
>> My other option is to just set the reserved amount very high on every node,
>> but that will lead to a lot of wasted space as many nodes won't have a very
>> large storage demand outside of DFS.
> How about keeping one disk dedicated for all other intents outside of
> the DFS's grasp?

View raw message