hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel
Date Fri, 04 Dec 2015 19:01:11 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041997#comment-15041997
] 

Vinayakumar B commented on HDFS-8578:
-------------------------------------

bq. Hard-linking is also parallelized within each of these threads (default is 12 threads).
So the maximum number of threads you would potentially see is 12 disks (really it is storage
directories, but let's assume there are 1 storage dir per disk) * 3 namespaces * 12 (default
but configurable) hard-link worker threads = 432 threads.
I missed hard-link workers in my earlier comment. So effective threads would be 12*12=144.
As said no parallelism across namespace.

> On upgrade, Datanode should process all storage/data dirs in parallel
> ---------------------------------------------------------------------
>
>                 Key: HDFS-8578
>                 URL: https://issues.apache.org/jira/browse/HDFS-8578
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Raju Bairishetti
>            Assignee: Vinayakumar B
>            Priority: Critical
>         Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, HDFS-8578-03.patch, HDFS-8578-04.patch,
HDFS-8578-05.patch, HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, HDFS-8578-09.patch,
HDFS-8578-10.patch, HDFS-8578-11.patch, HDFS-8578-12.patch, HDFS-8578-branch-2.6.0.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs sequentially.
Assume it takes ~20 mins to process a single storage dir then  datanode which has ~10 disks
will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>    for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>       doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>       assert getCTime() == nsInfo.getCTime() 
>           : "Data-node and name-node CTimes must be the same.";
>     }
> {code}
> It would save lots of time during major upgrades if datanode process all storagedirs/disks
parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message