hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-296) Do not assign blocks to a datanode with < x mb free
Date Wed, 14 Jun 2006 01:51:30 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-296?page=comments#action_12416109 ] 

Konstantin Shvachko commented on HADOOP-296:

If you look further down in FSNamesystem.chooseTarget() there is code that selects nodes that
have space 
for at least MIN_BLOCKS_FOR_WRITE (5 by default) blocks.
Then, when data nodes calculate remaining disk size (see FSDataset.getRemaining()) they use
and the value of the member FSDataset.reserved, which is initially set to 0, and then reflects
the amount of space allocated 
for the ongoing block creates. 

I think we should let individual data nodes be in control of the amount of space they need/want
to preserve. 
Rather than enforcing it on the name node uniformly for all data nodes. 
This would solve your problem configuring very different machines on the cluster
with respect to their disk capacities.

So I propose to add 2 new configuration parameters for data nodes.
1) dfs.datanode.du.pct   which is just a configurable variant of USABLE_DISK_PCT.
2) dfs.datanode.du.reserved   which specifies the amount of space that should always remain
on the node.
Then at startup FSDataset.reserved can be set to dfs.datanode.du.reserved rather than 0, 
and USABLE_DISK_PCT should be replaced by dfs.datanode.du.pct

> Do not assign blocks to a datanode with < x mb free
> ---------------------------------------------------
>          Key: HADOOP-296
>          URL: http://issues.apache.org/jira/browse/HADOOP-296
>      Project: Hadoop
>         Type: New Feature

>   Components: dfs
>     Versions: 0.3.2
>     Reporter: Johan Oskarson
>  Attachments: minspace.patch
> We're running a smallish cluster with very different machines, some with only 60 gb harddrives
> This creates a problem when inserting files into the dfs, these machines run out of space
quickly and then they cannot run any map reduce operations
> A solution would be to not assign any new blocks once the space is below a certain user
configurable threshold
> This free space could then be used by the map reduce operations instead (if that's on
the same disk)

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message