Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9090E200C61 for ; Tue, 11 Apr 2017 00:50:45 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 8EF2C160BA7; Mon, 10 Apr 2017 22:50:45 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D6543160B99 for ; Tue, 11 Apr 2017 00:50:44 +0200 (CEST) Received: (qmail 6609 invoked by uid 500); 10 Apr 2017 22:50:44 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 6597 invoked by uid 99); 10 Apr 2017 22:50:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Apr 2017 22:50:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 7C31BC05A7 for ; Mon, 10 Apr 2017 22:50:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id EF3kEyG3hIZV for ; Mon, 10 Apr 2017 22:50:42 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 838285FBC1 for ; Mon, 10 Apr 2017 22:50:42 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 0C398E0D3E for ; Mon, 10 Apr 2017 22:50:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id B95F924066 for ; Mon, 10 Apr 2017 22:50:41 +0000 (UTC) Date: Mon, 10 Apr 2017 22:50:41 +0000 (UTC) From: "Uma Maheswara Rao G (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-11338) [SPS]: Fix timeout issue in unit tests caused by longger NN down time MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 10 Apr 2017 22:50:45 -0000 [ https://issues.apache.org/jira/browse/HDFS-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963614#comment-15963614 ] Uma Maheswara Rao G commented on HDFS-11338: -------------------------------------------- Its is good to see failures fixed now. However I have few comments on the changes. Join is a thread method. So, keeping this method directly in non thread classes like BlockStorageMovementAttemptedItems may not be appropriate IMO. How about having another method called 'disable' instead of join. This method will interrupt internal threads and disable functionality? Like it can make running flags false and interrupt threads. Then rename the current stop method to stopGraceFully(). This method should do following, if thread is running already, then interrupt and join. If it is not running, then just join to have graceful stop. So, if you want to have two step stop to save time, then call disable (this is not graceful stop), then call other other system threads interrupts and finally call stopGracefully(this will make sure to stop gracefully, means it will call disable if its not disabled already and then join). 1. Use stopGracefully for dynamic start/stop feature. 2. Use 2 step stop for NN start/stop case to optimize time. Thoughts? > [SPS]: Fix timeout issue in unit tests caused by longger NN down time > --------------------------------------------------------------------- > > Key: HDFS-11338 > URL: https://issues.apache.org/jira/browse/HDFS-11338 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode > Reporter: Wei Zhou > Assignee: Wei Zhou > Attachments: HDFS-11338-HDFS-10285.00.patch, HDFS-11338-HDFS-10285.01.patch, HDFS-11338-HDFS-10285-02.patch, HDFS-11338-HDFS-10285-03.patch > > > As discussed in HDFS-11186, it takes longer to stop NN: > {code} > try { > storagePolicySatisfierThread.join(3000); > } catch (InterruptedException ie) { > } > {code} > So, it takes longer time to finish some tests and this leads to the timeout failures. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org