hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo Nicholas Sze (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel
Date Sat, 16 Jan 2016 15:13:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103223#comment-15103223
] 

Tsz Wo Nicholas Sze commented on HDFS-8578:
-------------------------------------------

{quote}
1. Replace addBlockPoolStorage method with getBlockPoolSliceStorage. This just changes the
synchronization around adding things to bpStorageMap. The change isn't necessary for the single
threaded case, but doesn't hurt so we can commit it before parallelizing.
2. Add the new createStorageID method to correctly handle upgrading from layout versions earlier
then ADD_DATANODE_AND_STORAGE_UUIDS. This seems like a slightly separate issue, but I may
be missing something. In any case, we can probably commit this as well without the parallelization.
{quote}
Sure, let's do a code refactoring before changing to processing all storage/data dirs in parallel.
 Filed HDFS-9654.

> On upgrade, Datanode should process all storage/data dirs in parallel
> ---------------------------------------------------------------------
>
>                 Key: HDFS-8578
>                 URL: https://issues.apache.org/jira/browse/HDFS-8578
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Raju Bairishetti
>            Assignee: Vinayakumar B
>            Priority: Critical
>         Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, HDFS-8578-03.patch, HDFS-8578-04.patch,
HDFS-8578-05.patch, HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, HDFS-8578-09.patch,
HDFS-8578-10.patch, HDFS-8578-11.patch, HDFS-8578-12.patch, HDFS-8578-13.patch, HDFS-8578-14.patch,
HDFS-8578-15.patch, HDFS-8578-16.patch, HDFS-8578-17.patch, HDFS-8578-branch-2.6.0.patch,
HDFS-8578-branch-2.7-001.patch, HDFS-8578-branch-2.7-002.patch, HDFS-8578-branch-2.7-003.patch,
h8578_20151210.patch, h8578_20151211.patch, h8578_20151211b.patch, h8578_20151212.patch, h8578_20151213.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs sequentially.
Assume it takes ~20 mins to process a single storage dir then  datanode which has ~10 disks
will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>    for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>       doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>       assert getCTime() == nsInfo.getCTime() 
>           : "Data-node and name-node CTimes must be the same.";
>     }
> {code}
> It would save lots of time during major upgrades if datanode process all storagedirs/disks
parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message