Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Mon, 4 Jan 2016 23:23:40 +0000 (UTC)
From: "Chris Trezzo (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12837110.1434019256000.14872.1451949820116@Atlassian.JIRA>
In-Reply-To: <JIRA.12837110.1434019256000@Atlassian.JIRA>
References: <JIRA.12837110.1434019256000@Atlassian.JIRA>
 <JIRA.12837110.1434019256008@arcas>
Subject: [jira] [Commented] (HDFS-8578) On upgrade, Datanode should process
 all storage/data dirs in parallel
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081991#comment-15081991 ] 

Chris Trezzo commented on HDFS-8578:
------------------------------------

Hi [~szetszwo], thanks for the updated patch! One comment so far:

I think that we can get rid of the SubmissionService class and directly use an ExecutorService instead. You can use the shutdown and awaitTermination methods to wait for all of the doUpgrade tasks to complete. This way we will not need an extra class or to keep track of the number of tasks submitted.

You might need to pass one more list of futures into the methods so that when the callables are submitted we can keep track of the futures to fill the success list at the end. I am not totally convinced that we need this yet though.

> On upgrade, Datanode should process all storage/data dirs in parallel
> ---------------------------------------------------------------------
>
>                 Key: HDFS-8578
>                 URL: https://issues.apache.org/jira/browse/HDFS-8578
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Raju Bairishetti
>            Assignee: Vinayakumar B
>            Priority: Critical
>         Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, HDFS-8578-09.patch, HDFS-8578-10.patch, HDFS-8578-11.patch, HDFS-8578-12.patch, HDFS-8578-13.patch, HDFS-8578-14.patch, HDFS-8578-15.patch, HDFS-8578-16.patch, HDFS-8578-17.patch, HDFS-8578-branch-2.6.0.patch, HDFS-8578-branch-2.7-001.patch, HDFS-8578-branch-2.7-002.patch, HDFS-8578-branch-2.7-003.patch, h8578_20151210.patch, h8578_20151211.patch, h8578_20151211b.patch, h8578_20151212.patch, h8578_20151213.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs sequentially. Assume it takes ~20 mins to process a single storage dir then  datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>    for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>       doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>       assert getCTime() == nsInfo.getCTime() 
>           : "Data-node and name-node CTimes must be the same.";
>     }
> {code}
> It would save lots of time during major upgrades if datanode process all storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)