hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5156) YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
Date Thu, 02 Jun 2016 06:05:59 GMT

    [ https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311802#comment-15311802
] 

Varun Saxena commented on YARN-5156:
------------------------------------

[~vrushalic], thanks for the patch.
Regarding the patch, I do not think we should be relying on container exit status to decide
container state.
Issue here is not restricted to container exit status being INVALID(-1000). The issue here
is that we are sending ApplicationContainerFinishedEvent with the container state in it (from
ContainerImpl) before the state machine transition is complete. Until the transition is complete,
container state wont change to DONE(which would then be interpreted as COMPLETE).
This means that even when container exit status is not -1000, container state would be interpreted
as RUNNING(refer to ContainerImpl#getCurrentState). Even in the snapshot Li posted while raising
the JIRA, the container exit code is 0 but the state is RUNNING.
{code}
events: [
{
id: "YARN_CONTAINER_FINISHED",
timestamp: 1464213765890,
info: {
YARN_CONTAINER_EXIT_STATUS: 0,
YARN_CONTAINER_STATE: "RUNNING",
YARN_CONTAINER_DIAGNOSTICS_INFO: ""
}
{code}

To resolve this issue, we can either hardcode and send Container State as COMPLETE(in NMTimelinePublisher).
Or drop YARN_CONTAINER_STATE info altogether. We currently have  this info only for container
finished event and we know it would lead to a value of COMPLETE, irrespective of the exit
code. Even if container exits with failure, the eventual transition will be DONE(which will
be converted to COMPLETE) after resources are cleaned up(refer to transitions from EXITED_WITH_FAILURE
in ContainerImpl). Anyways the only 2 possible container states reported to ATSv2 can be either
RUNNING or COMPLETE and we should be able to decipher that based on event published. So do
we really need this info anyways ?
We can also consider letting ContainerImpl fill the correct value of state (in case ContainerState
expands in future - unlikely though). Because we would know in code, which state, the state
machine would transition to and can fill correct value based on that.

> YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
> -------------------------------------------------------------------------
>
>                 Key: YARN-5156
>                 URL: https://issues.apache.org/jira/browse/YARN-5156
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>            Assignee: Vrushali C
>         Attachments: YARN-5156-YARN-2928.01.patch
>
>
> On container finished, we're reporting "YARN_CONTAINER_STATE: "RUNNING"". Do we design
this deliberately or it's a bug? 
> {code}
> {
> metrics: [ ],
> events: [
> {
> id: "YARN_CONTAINER_FINISHED",
> timestamp: 1464213765890,
> info: {
> YARN_CONTAINER_EXIT_STATUS: 0,
> YARN_CONTAINER_STATE: "RUNNING",
> YARN_CONTAINER_DIAGNOSTICS_INFO: ""
> }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED",
> timestamp: 1464213761133,
> info: { }
> },
> {
> id: "YARN_CONTAINER_CREATED",
> timestamp: 1464213761132,
> info: { }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED",
> timestamp: 1464213761132,
> info: { }
> }
> ],
> id: "container_e15_1464213707405_0001_01_000018",
> type: "YARN_CONTAINER",
> createdtime: 1464213761132,
> info: {
> YARN_CONTAINER_ALLOCATED_PRIORITY: "20",
> YARN_CONTAINER_ALLOCATED_VCORE: 1,
> YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS: "10.22.16.164:0",
> UID: "yarn_cluster!application_1464213707405_0001!YARN_CONTAINER!container_e15_1464213707405_0001_01_000018",
> YARN_CONTAINER_ALLOCATED_HOST: "10.22.16.164",
> YARN_CONTAINER_ALLOCATED_MEMORY: 1024,
> SYSTEM_INFO_PARENT_ENTITY: {
> type: "YARN_APPLICATION_ATTEMPT",
> id: "appattempt_1464213707405_0001_000001"
> },
> YARN_CONTAINER_ALLOCATED_PORT: 64694
> },
> configs: { },
> isrelatedto: { },
> relatesto: { }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message