hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uma Maheswara Rao G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11338) [SPS]: Fix timeout issue in unit tests caused by longger NN down time
Date Mon, 10 Apr 2017 22:50:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963614#comment-15963614

Uma Maheswara Rao G commented on HDFS-11338:

Its is good to see failures fixed now. However I have few comments on the changes.
Join is a thread method. So, keeping this method directly in non thread classes like BlockStorageMovementAttemptedItems
may not be appropriate IMO.

How about having another method called 'disable' instead of join. This method will interrupt
internal threads and disable functionality? Like it can make running flags false and interrupt
 Then rename the current stop method to stopGraceFully(). This method should do following,
if thread is running already, then interrupt and join. If it is not running, then just join
to have graceful stop.

So, if you want to have two step stop to save time, then call disable (this is not graceful
stop), then call other other system threads interrupts and finally call stopGracefully(this
will make sure to stop gracefully, means it will call disable if its not disabled already
and then join). 
1. Use stopGracefully for dynamic start/stop feature. 
2. Use 2 step stop for  NN start/stop case to optimize time. 

> [SPS]: Fix timeout issue in unit tests caused by longger NN down time
> ---------------------------------------------------------------------
>                 Key: HDFS-11338
>                 URL: https://issues.apache.org/jira/browse/HDFS-11338
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>            Reporter: Wei Zhou
>            Assignee: Wei Zhou
>         Attachments: HDFS-11338-HDFS-10285.00.patch, HDFS-11338-HDFS-10285.01.patch,
HDFS-11338-HDFS-10285-02.patch, HDFS-11338-HDFS-10285-03.patch
> As discussed in HDFS-11186, it takes longer to stop NN:
> {code}
> try {
>   storagePolicySatisfierThread.join(3000);
> } catch (InterruptedException ie) {
> }
> {code}
> So, it takes longer time to finish some tests and this leads to the timeout failures.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message