hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Milind Bhandarkar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes
Date Mon, 07 Aug 2006 22:37:15 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426354 ] 
Milind Bhandarkar commented on HADOOP-64:

About check-pointing datanodes. I agree that it is a needless complexity. I was confused about
this as well. But as Konstantin pointed out to me, the datanode checkpointing proposal is
NOT checkpointing datanodes' state, but checkpointing datanodes' blockreport in the namenode
checkpoint. Thanks konstantin.

As the proposal (and the implementation) currently stands, if dfs.data.dir is read-only, the
datanode reports to be dead, since block-delete etc operations cannot be carried out on it.
The namenode treats that datanode as dead, and tries to re-replicate its blocks on other data
nodes. The same behavior will continue, except the datanode will not report itself to be dead
if at least one volume specified in the dfs.data.dir list is read-write. However, it will
not report blocks contained in read-only volumes.

Storage-ID continues to be one per datanode. Putting blocks in different volumes is datanode-internal.

The DF.java contains code to detect mount. This will be considered to be the differentiation
between different disks. Even if it is not right, it does not preclude correct operation of
datanode, only performance is affected. Performance will be maximized if all volumes specified
in dfs.data.dir are located on different local disks.

Making read-only mounts visible on namenode is an orthogonal issue. My proposal specifies
a backward-compatible way of dealing with it.

Using the last x bits to map a block on local directory will minimize datanode's state as
well as keep the directory size minimal (since block-ids are random). Consider it an implicit
hashtable on disk.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
> The dfs Datanode can only store data on a single filesystem volume. When a node runs
its disks JBOD this means running a Datanode per disk on the machine. While the scheme works
reasonably well on small clusters, on larger installations (several 100 nodes) it implies
a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message