Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B4F251040D for ; Tue, 24 Sep 2013 03:40:33 +0000 (UTC) Received: (qmail 75590 invoked by uid 500); 24 Sep 2013 03:40:12 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 75537 invoked by uid 500); 24 Sep 2013 03:40:09 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 75481 invoked by uid 99); 24 Sep 2013 03:40:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Sep 2013 03:40:05 +0000 Date: Tue, 24 Sep 2013 03:40:05 +0000 (UTC) From: "Zhijie Shen (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MAPREDUCE-5505) Clients should be notified job finished only after job successfully unregistered MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated MAPREDUCE-5505: ----------------------------------- Attachment: MAPREDUCE-5505.4.patch Thanks [~bikassaha] for reviewing the patch. I've updated it accordingly. bq. Typo - already ... Fixed bq. Lets use a true value to be back-compatible in case it gets used. Fixed bq. Typo - mock Fixed bq. Please put a comment for this non-obvious code. We need it because noone is calling shutdown job right. Alternatively, can MRApp.createJob() be changed to call MRAppMaster.shutdown() or set the boolean value to true. This would be closer to ideal that the current approach. Agree the code is non-obvious. Instead of moving setting safeToReportTerminationToUser to MRApp.createJob(), I moved it to the constructor of MRApp, because safeToReportTerminationToUser is the per MRAppMaster variable. bq. Can we have a test that verifies the main straightline case. Job succeeds and returns running until the boolean is set.? Added TestMRApp#testJobSuccess bq. How can we be sure that the previous state == RUNNING Its a general issue. To solve it in all the transitions of JobImpl, I added the code to remember the last non-final state. Then, whenever safeToReportTerminationToUser is false, JobImpl returns the stored previous state instead of the final state, i.e., SUCCEEDED, FAILED, KILLED and ERROR. bq. Has this been tested on a single node cluster with a real job? Tested locally. The job client saw RUNNING until AM got unregistered. > Clients should be notified job finished only after job successfully unregistered > --------------------------------------------------------------------------------- > > Key: MAPREDUCE-5505 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5505 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Jian He > Assignee: Zhijie Shen > Attachments: MAPREDUCE-5505.1.patch, MAPREDUCE-5505.1.patch, MAPREDUCE-5505.3.patch, MAPREDUCE-5505.4.patch > > > This is to make sure user is notified job finished after job is really done. This does increase client latency but can reduce some races during unregister like YARN-540 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira