hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes
Date Tue, 08 Aug 2006 02:29:16 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426393 ] 
            
Konstantin Shvachko commented on HADOOP-64:
-------------------------------------------

 = I believe there was a misunderstanding on the datanode checkpointing issue.
HADOOP-306 proposes to checkpoint only the list of datanodes, effectively DatanodeInfo.
It was not meant to store the datanode block reports.
The block map is not and should not be checkpointed.

= DF on Windows will return the drive letter, which can be used to distinguish disks.
It will work only for local disks though. Mounted (mapped network) drives on Windows won't
work.

= I agree storageID should be the same per node. It will need to be stored separately on each
drive.
Otherwise, if only one drive stores the id and gets corrupted we will not be able to restore
the
storage id for other drives. Also, the storage files on each drive should be locked when the
datanode
starts to prevent from running multiple data nodes with the same blocks.

= It is a good idea that the number of directories is a power of 2.
But I do not support the idea to reserve any number of bits of block-id to determine block
locations, for 2 reasons.
a) Block replicas can have different locations on different data nodes.
b) The block id is issued by the namenode, and it is not good if the namenode will need to
know about a datanode storage setup.
Instead, we can partition bit representation of the block id into a number of parts consistent
with the number of directories and e.g. XOR them. The result will represent the directory
name.
I think this will be random enough.

= I don't think the datanode can be even start on a read-only disk.
The storage file won't open.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs
its disks JBOD this means running a Datanode per disk on the machine. While the scheme works
reasonably well on small clusters, on larger installations (several 100 nodes) it implies
a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message