hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haohui Mai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel
Date Mon, 28 Sep 2015 21:44:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934110#comment-14934110
] 

Haohui Mai commented on HDFS-8578:
----------------------------------

bq. Datanode may *require more memory* to process all volumes/disks in parallel.

That doesn't sound right to me. Can you identify why it requires more memory if the tasks
of each threads are to create hardlinks for blocks? Note that in the current implementation
the lists of blocks have been pre-calculated.

> On upgrade, Datanode should process all storage/data dirs in parallel
> ---------------------------------------------------------------------
>
>                 Key: HDFS-8578
>                 URL: https://issues.apache.org/jira/browse/HDFS-8578
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Raju Bairishetti
>            Assignee: Vinayakumar B
>            Priority: Critical
>         Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, HDFS-8578-03.patch, HDFS-8578-04.patch,
HDFS-8578-05.patch, HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, HDFS-8578-09.patch,
HDFS-8578-branch-2.6.0.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs sequentially.
Assume it takes ~20 mins to process a single storage dir then  datanode which has ~10 disks
will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>    for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>       doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>       assert getCTime() == nsInfo.getCTime() 
>           : "Data-node and name-node CTimes must be the same.";
>     }
> {code}
> It would save lots of time during major upgrades if datanode process all storagedirs/disks
parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message