Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2EA71200B40 for ; Thu, 2 Jun 2016 01:38:01 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2D544160A51; Wed, 1 Jun 2016 23:38:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 766B4160A4C for ; Thu, 2 Jun 2016 01:38:00 +0200 (CEST) Received: (qmail 33878 invoked by uid 500); 1 Jun 2016 23:37:59 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 33798 invoked by uid 99); 1 Jun 2016 23:37:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jun 2016 23:37:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 64DF92C1F62 for ; Wed, 1 Jun 2016 23:37:59 +0000 (UTC) Date: Wed, 1 Jun 2016 23:37:59 +0000 (UTC) From: "Vrushali C (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-5156) YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 01 Jun 2016 23:38:01 -0000 [ https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311369#comment-15311369 ] Vrushali C commented on YARN-5156: ---------------------------------- While the change in NM may be broader, I think we have two options at our end: - not store the final status in the container finished event - store it as complete if it's incorrectly set. For #1, it sort of leads to inconsistent contents across events. With hRaven, the experience has been that applications that used hRaven like reducer estimation would query for "finished" or "completed" jobs, and if such fields were missing, it lead to incorrect calculations at their end. For example, in hRaven I patched a very similar issue : the jobStatus field seemed to occur only in JOB_INITED, JOB_KILLED AND JOB_FAILED events. It was missing from the JOB_FINISHED event. The jobStatus field should be a part of the JOB_FINISHED event in the history file when it's generated. https://github.com/twitter/hraven/issues/72 So my vote is for at least ensuring we store some consistent information for all events. In case of container finished, it is reasonable to think that the container had completed. I am attaching a very basic patch. I am still checking if that helps fix this issue in my pseudo-distributed cluster. > YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state > ------------------------------------------------------------------------- > > Key: YARN-5156 > URL: https://issues.apache.org/jira/browse/YARN-5156 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Li Lu > Attachments: YARN-5156-YARN-2928.01.patch > > > On container finished, we're reporting "YARN_CONTAINER_STATE: "RUNNING"". Do we design this deliberately or it's a bug? > {code} > { > metrics: [ ], > events: [ > { > id: "YARN_CONTAINER_FINISHED", > timestamp: 1464213765890, > info: { > YARN_CONTAINER_EXIT_STATUS: 0, > YARN_CONTAINER_STATE: "RUNNING", > YARN_CONTAINER_DIAGNOSTICS_INFO: "" > } > }, > { > id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED", > timestamp: 1464213761133, > info: { } > }, > { > id: "YARN_CONTAINER_CREATED", > timestamp: 1464213761132, > info: { } > }, > { > id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED", > timestamp: 1464213761132, > info: { } > } > ], > id: "container_e15_1464213707405_0001_01_000018", > type: "YARN_CONTAINER", > createdtime: 1464213761132, > info: { > YARN_CONTAINER_ALLOCATED_PRIORITY: "20", > YARN_CONTAINER_ALLOCATED_VCORE: 1, > YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS: "10.22.16.164:0", > UID: "yarn_cluster!application_1464213707405_0001!YARN_CONTAINER!container_e15_1464213707405_0001_01_000018", > YARN_CONTAINER_ALLOCATED_HOST: "10.22.16.164", > YARN_CONTAINER_ALLOCATED_MEMORY: 1024, > SYSTEM_INFO_PARENT_ENTITY: { > type: "YARN_APPLICATION_ATTEMPT", > id: "appattempt_1464213707405_0001_000001" > }, > YARN_CONTAINER_ALLOCATED_PORT: 64694 > }, > configs: { }, > isrelatedto: { }, > relatesto: { } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org