hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Pendleton (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes
Date Mon, 07 Aug 2006 22:13:16 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426343 ] 
            
Bryan Pendleton commented on HADOOP-64:
---------------------------------------

Why do datanodes need to checkpoint? What's the value of storing out the mapping, vs. re-enumerating
them at startup time? The namenode doesn't keep track of what nodes have which blocks, why
should a storage node keep track any more rigorously within its own state? I'd argue that
all of that complexity is needless - the cost of maintaining a consistent state is way too
high for little benefit.

Please make it very easy to change the block-allocation code. The default behaviors of the
current code have been causing troubles on my very heterogenous cluster for a very long time
- uniform distribution only really actually makes sense if the same amount of space is available
on each drive. For all other cases, doing this leads immediately to unnecessary failures.

I'm not sure about the "blocks considered lost on read-only volumes" bit, but, if that implies
that the blocks become unavailable, then I think the approach is too heavy-handed. Those blocks
might be the only copies, and ignoring them means that cluster might not be able to find a
live copy of a block anywhere else. Please clarify what a "lost" block is.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs
its disks JBOD this means running a Datanode per disk on the machine. While the scheme works
reasonably well on small clusters, on larger installations (several 100 nodes) it implies
a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message