Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9644D104A1 for ; Wed, 17 Jul 2013 21:56:49 +0000 (UTC) Received: (qmail 68595 invoked by uid 500); 17 Jul 2013 21:56:49 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 68513 invoked by uid 500); 17 Jul 2013 21:56:49 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 68309 invoked by uid 99); 17 Jul 2013 21:56:49 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Jul 2013 21:56:49 +0000 Date: Wed, 17 Jul 2013 21:56:48 +0000 (UTC) From: "Omkar Vinit Joshi (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-906) TestNMClient.testNMClientNoCleanupOnStop fails occasionally MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711683#comment-13711683 ] Omkar Vinit Joshi commented on YARN-906: ---------------------------------------- what you are saying above completely makes sense.. That is definitely a problem because of mismatch between dispatcher queue processing events and exec actually launching the thread. We should probably make sure that whole computation of call method is moved inside the try{} catch{} and just in the beginning check for the flag status. For updating flag status we definitely need locking.... an alternative solution which seems most logical to me is that what if we send the same event from the place where we are canceling thread and expect /ignore additional event at KILLING state...didn't thought much about it ..but worth considering an alternative solution...thoughts? [~vinodkv] what surprises me here is our single dispatcher thread model.. :( we really can see multiple issues if anywhere in between state transition we have client requests and it does cancel some of the expected code path ...destroying expected state transitions.. btw interesting finding [~zjshen] :) > TestNMClient.testNMClientNoCleanupOnStop fails occasionally > ----------------------------------------------------------- > > Key: YARN-906 > URL: https://issues.apache.org/jira/browse/YARN-906 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Zhijie Shen > Assignee: Zhijie Shen > Attachments: YARN-906.1.patch > > > See https://builds.apache.org/job/PreCommit-YARN-Build/1435//testReport/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClientNoCleanupOnStop/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira