hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes
Date Wed, 09 Aug 2006 20:48:16 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12427026 ] 
Konstantin Shvachko commented on HADOOP-64:

This proposition looks good to me.
The only thing that seems excessive is the dynamic data structures for maintaining
blockid-to-directory mapping.
The alternative is to do a static mapping based on blockids and the number of directories.
Suppose that the maximal number of entries per directory is N. We should define a function
      dirName( blockId, N, dirLevel )
which returns a local directory name for each level of the directory tree.
So the datanode needs to store  only the current hight  of the directory tree H.
Then for a given  blockId, its path is determined by
      / dirName(blockId,N,0) / dirName(blockId,N,1) / ... / dirName(blockId,N,H)
And when the datanode needs to add a new directory level it will not need
to rename anything in the existing directory tree.
I see a disadvantage of this approach, that the directories should be
re-structured if the maximal number of entries per directory is changed.
But the same is applicable for the dynamic approach, at least when N is decreased.
We might consider hardcoding N rather than having it configurable.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
> The dfs Datanode can only store data on a single filesystem volume. When a node runs
its disks JBOD this means running a Datanode per disk on the machine. While the scheme works
reasonably well on small clusters, on larger installations (several 100 nodes) it implies
a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message