hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang Xu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1362) Provide volume management functionality for DataNode
Date Wed, 16 Mar 2011 03:08:29 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007330#comment-13007330

Wang Xu commented on HDFS-1362:


Most SATA controller support hotswap, and all SATA devices support it. (Ref the libata wiki
on: https://ata.wiki.kernel.org )

And for the operational issue. many servers have per-disk status LED, some of them could be
programming. Thus the management system can identify the failed disk by it. Without a status
identification, it's indeed hard for maintainers to find the right disks. 

My assumption is:
# manually change the disk.
# find new device and enable it, then make local fs on it, and then mount it and make essential
dirs. This step could be done by external management system or manually.
# re-enable the disk in hadoop


Thanks for the code review, the recoverTransitionRead and recoverTransitionAdditionalRead
are almost the same except the end "writeAll" at the end. when we add additional disks, we
should not writeAll(). Should we split the recoverTransitionRead into different parts and
re-use them?

> Provide volume management functionality for DataNode
> ----------------------------------------------------
>                 Key: HDFS-1362
>                 URL: https://issues.apache.org/jira/browse/HDFS-1362
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node
>    Affects Versions: 0.23.0
>            Reporter: Wang Xu
>            Assignee: Wang Xu
>             Fix For: 0.23.0
>         Attachments: DataNode Volume Refreshment in HDFS-1362.pdf, HDFS-1362.4_w7001.txt,
HDFS-1362.5.patch, HDFS-1362.6.patch, HDFS-1362.7.patch, HDFS-1362.txt, Provide_volume_management_for_DN_v1.pdf
> The current management unit in Hadoop is a node, i.e. if a node failed, it will be kicked
out and all the data on the node will be replicated.
> As almost all SATA controller support hotplug, we add a new command line interface to
datanode, thus it can list, add or remove a volume online, which means we can change a disk
without node decommission. Moreover, if the failed disk still readable and the node has enouth
space, it can migrate data on the disks to other disks in the same node.
> A more detailed design document will be attached.
> The original version in our lab is implemented against 0.20 datanode directly, and is
it better to implemented it in contrib? Or any other suggestion?

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message