hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yoram Arnon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes
Date Mon, 07 Aug 2006 23:40:15 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426366 ] 
            
Yoram Arnon commented on HADOOP-64:
-----------------------------------

dfs.data.dir is currently used to specify the location of temporary files written by dfs client
(data is written to disk, then an entire dfs block is streamed to the datanodes). Rather than
trying to support a multiple-volume behaviour there too, let's separate the client config
from the datanode config, using 'client.tempdata.dir'. Try to make the change backwards compatible.

read-only drives are hard to maintain except by totally ignoring them, since data can not
be deleted from them. If a file is deleted, then a blockid is reclaimed for another file,
bad things might happen if that blockid is served by some read-only volume. If it's the last
copy of a block, *and* the volume is read-only and on its way to be dead, then that block
is unfortunately lost.

round robin is a bit harsh as an allocation scheme. allocation proportional to free space
would work better IMO.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs
its disks JBOD this means running a Datanode per disk on the machine. While the scheme works
reasonably well on small clusters, on larger installations (several 100 nodes) it implies
a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message