Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Mon, 10 Apr 2017 22:50:41 +0000 (UTC)
From: "Uma Maheswara Rao G (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.13034076.1484199622000.251957.1491864641756@Atlassian.JIRA>
In-Reply-To: <JIRA.13034076.1484199622000@Atlassian.JIRA>
References: <JIRA.13034076.1484199622000@Atlassian.JIRA> <JIRA.13034076.1484199622399@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HDFS-11338) [SPS]: Fix timeout issue in unit
 tests caused by longger NN down time
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Mon, 10 Apr 2017 22:50:45 -0000


    [ https://issues.apache.org/jira/browse/HDFS-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963614#comment-15963614 ] 

Uma Maheswara Rao G commented on HDFS-11338:
--------------------------------------------

Its is good to see failures fixed now. However I have few comments on the changes.
Join is a thread method. So, keeping this method directly in non thread classes like BlockStorageMovementAttemptedItems may not be appropriate IMO.

How about having another method called 'disable' instead of join. This method will interrupt internal threads and disable functionality? Like it can make running flags false and interrupt threads. 
 Then rename the current stop method to stopGraceFully(). This method should do following, if thread is running already, then interrupt and join. If it is not running, then just join to have graceful stop.

So, if you want to have two step stop to save time, then call disable (this is not graceful stop), then call other other system threads interrupts and finally call stopGracefully(this will make sure to stop gracefully, means it will call disable if its not disabled already and then join). 
1. Use stopGracefully for dynamic start/stop feature. 
2. Use 2 step stop for  NN start/stop case to optimize time. 
Thoughts?


> [SPS]: Fix timeout issue in unit tests caused by longger NN down time
> ---------------------------------------------------------------------
>
>                 Key: HDFS-11338
>                 URL: https://issues.apache.org/jira/browse/HDFS-11338
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>            Reporter: Wei Zhou
>            Assignee: Wei Zhou
>         Attachments: HDFS-11338-HDFS-10285.00.patch, HDFS-11338-HDFS-10285.01.patch, HDFS-11338-HDFS-10285-02.patch, HDFS-11338-HDFS-10285-03.patch
>
>
> As discussed in HDFS-11186, it takes longer to stop NN:
> {code}
> try {
>   storagePolicySatisfierThread.join(3000);
> } catch (InterruptedException ie) {
> }
> {code}
> So, it takes longer time to finish some tests and this leads to the timeout failures.


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org