hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sameer Paranjpye (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes
Date Mon, 07 Aug 2006 22:22:16 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426347 ] 
Sameer Paranjpye commented on HADOOP-64:

Can we map effectively map volumes to devices on Windows? Will 'df' under cygwin produce a
comprehensible mapping of paths to devices? Maybe this should be left out of the implementation?

Code for monitoring disk capacity on the datanode will need to be updated to run 'df' on all
volumes considered.  Round robin placement needs to account for differences in capacity on
the various volumes.

How does this interact with Konstantin's storage id implementation? We will now need to have
1 storage-id across multiple volumes.

Do we need to use the last x-bits of a block to map it to a directory? Maybe we should use
a simple round robin scheme here as well. The amount of state is small enough to keep in a
hastable, no?

Do we ever need to checkpoint datanodes? Seems like that is a separable discussion. In any
case, it seems like the less state we keep in side files the better it is.

We should include a mechanism to make read-only volumes visible on the namenode, as part of
the health/status page, so that admins can be alerted in a timely manner.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
> The dfs Datanode can only store data on a single filesystem volume. When a node runs
its disks JBOD this means running a Datanode per disk on the machine. While the scheme works
reasonably well on small clusters, on larger installations (several 100 nodes) it implies
a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message