hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Milind Bhandarkar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes
Date Tue, 08 Aug 2006 22:48:17 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426764 ] 
            
Milind Bhandarkar commented on HADOOP-64:
-----------------------------------------

Thanks for your inputs Yoram, Konstantin, Bryan, Sameer.

Here is my modified proposal:

1. The config parameter dfs.data.dir could have a list of directories separated by commas.
2. Another config parameter (client.buffer.dir) will contain comma-separated list of directories
for buffering blocks until they are sent to datanode. DFS client will manage the in-memory
map of blocks to these directories.
3. Datanode will maintain a map in memory of blockid's mapped to storage locations.
4. Datanode will choose appropriate location to write a block based on a separate block-to-volume
placement strategy. Information about volumes will be made available to this strategy with
DF.
5. Datanode will try to report correct available diskspace by appropriately taking into account
the space reported by DF on each volume. If the mount point is same for more than one volume,
then the available disk space will not be counted twice.
6. Storage-ID will be unique per data node, and will be stored in each of the volumes at top
levels.
7. Each volume will further be separated into a shallow directory hierarchy, with maximum
of N blocks per directory. This block to directory mapping will also be maintained in a hashtable
by datanode. as a directory fills up, new directory will be created as a sibling, upto a maximum
of N siblings. Then second level of directories will start. The parameter N can be specified
as a config variable "dfs.data.numdir".
8. Only if all the volumes specified in dfs.data.dir are read-only, the datanode will shutdown.
Otherwise, it will log the readonly directories, and treat them as if they were never specified
in dfs.data.dir list. This behavior is consistent with current state of implementation.

If there are any other issues to think about, please comment.


> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs
its disks JBOD this means running a Datanode per disk on the machine. While the scheme works
reasonably well on small clusters, on larger installations (several 100 nodes) it implies
a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message