hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Trezzo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel
Date Mon, 04 Jan 2016 23:23:40 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081991#comment-15081991
] 

Chris Trezzo commented on HDFS-8578:
------------------------------------

Hi [~szetszwo], thanks for the updated patch! One comment so far:

I think that we can get rid of the SubmissionService class and directly use an ExecutorService
instead. You can use the shutdown and awaitTermination methods to wait for all of the doUpgrade
tasks to complete. This way we will not need an extra class or to keep track of the number
of tasks submitted.

You might need to pass one more list of futures into the methods so that when the callables
are submitted we can keep track of the futures to fill the success list at the end. I am not
totally convinced that we need this yet though.

> On upgrade, Datanode should process all storage/data dirs in parallel
> ---------------------------------------------------------------------
>
>                 Key: HDFS-8578
>                 URL: https://issues.apache.org/jira/browse/HDFS-8578
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Raju Bairishetti
>            Assignee: Vinayakumar B
>            Priority: Critical
>         Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, HDFS-8578-03.patch, HDFS-8578-04.patch,
HDFS-8578-05.patch, HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, HDFS-8578-09.patch,
HDFS-8578-10.patch, HDFS-8578-11.patch, HDFS-8578-12.patch, HDFS-8578-13.patch, HDFS-8578-14.patch,
HDFS-8578-15.patch, HDFS-8578-16.patch, HDFS-8578-17.patch, HDFS-8578-branch-2.6.0.patch,
HDFS-8578-branch-2.7-001.patch, HDFS-8578-branch-2.7-002.patch, HDFS-8578-branch-2.7-003.patch,
h8578_20151210.patch, h8578_20151211.patch, h8578_20151211b.patch, h8578_20151212.patch, h8578_20151213.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs sequentially.
Assume it takes ~20 mins to process a single storage dir then  datanode which has ~10 disks
will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>    for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>       doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>       assert getCTime() == nsInfo.getCTime() 
>           : "Data-node and name-node CTimes must be the same.";
>     }
> {code}
> It would save lots of time during major upgrades if datanode process all storagedirs/disks
parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message