hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "eric baldeschwieler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-297) When selecting node to put new block on, give priority to those with more free space/less blocks
Date Wed, 14 Jun 2006 03:14:30 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-297?page=comments#action_12416116 ] 

eric baldeschwieler commented on HADOOP-297:
--------------------------------------------

be careful.  You don't want to put all the hot new blocks on new nodes in the cluster.  That
can lead to real trouble.  We've seen that before!!  That ios what assigning to the most free
node first will do.

You might be able to throw some weight factor in without causing complete destruction.

Also we might want to think about a migration thread that uses a finite amount of bandwidth
reballancing the cluster.  Choosing old or random blocks to fill empty drives might work better.
 Of course, beware of corner cases where non-functional nodes volunteer for infinite migration
and such.

This is a non-trivial problem space.

> When selecting node to put new block on, give priority to those with more free space/less
blocks
> ------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-297
>          URL: http://issues.apache.org/jira/browse/HADOOP-297
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.3.2
>     Reporter: Johan Oskarson
>     Priority: Minor
>  Attachments: priorityshuffle_v1.patch
>
> As mentioned in previous bug report:
> We're running a smallish cluster with very different machines, some with only 60 gb harddrives
> This creates a problem when inserting files into the dfs, these machines run out of space
quickly while some have plenty of space free.
> So instead of just shuffling the nodes, I've created a quick patch that first sorts the
target nodes by (freespace / blocks).
> It then randomizes the position of the first third of the nodes (so we don't put all
the blocks in the file on the same machine)
> I'll let you guys figure out how to improve this.
> /Johan

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message