hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MAPREDUCE-6801) Fix flaky TestKill.testKillJob()
Date Thu, 17 Nov 2016 22:49:58 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15675040#comment-15675040
] 

Varun Saxena edited comment on MAPREDUCE-6801 at 11/17/16 10:49 PM:
--------------------------------------------------------------------

Thanks [~haibochen] for the patch. This should handle all the cases except one, although that
would happen rarely. If internal state at which job is stuck is SETUP (due to slow processing),
tasks wont be scheduled. Hence, task wont reach kill state for which we have an assertion
for. Internal state of SETUP means an external state of RUNNING. Therefore {{app.waitForState(job,
JobState.RUNNING)}} should be replaced by {{app.waitForInternalState((JobImpl) job, JobStateInternal.RUNNING)}}


I was able to simulate this case by putting a sleep in dispatcher.


was (Author: varun_saxena):
Thanks [~haibochen] for the patch. This should handle all the cases except one, although rarely.
If internal state at which job is stuck is SETUP (due to slow processing), tasks wont be scheduled.
Hence, task wont reach kill state for which we have an assertion for. Internal state of SETUP
means an external state of RUNNING. Therefore {{app.waitForState(job, JobState.RUNNING)}}
should be replaced by {{app.waitForInternalState((JobImpl) job, JobStateInternal.RUNNING)}}


I was able to simulate this case by putting a sleep in dispatcher.

> Fix flaky TestKill.testKillJob()
> --------------------------------
>
>                 Key: MAPREDUCE-6801
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6801
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>         Attachments: mapreduce6801.001.patch
>
>
> TestKill.testKillJob often fails for the same reason with the following error message:
> {code}
> 1 tests failed.
> FAILED:  org.apache.hadoop.mapreduce.v2.app.TestKill.testKillJob
> Error Message:
> Task state not correct expected:<KILLED> but was:<NEW/SCHEDULED/RUNNING>
> Stack Trace:
> java.lang.AssertionError: Task state not correct expected:<KILLED> but was:<NEW/SCHEDULED/RUNNING>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 	at org.junit.Assert.assertEquals(Assert.java:118)
> 	at org.apache.hadoop.mapreduce.v2.app.TestKill.testKillJob(TestKill.java:84)
> {code}
> The root cause is that when the job is in KILLED state from an external view, TaskKillEvents
and TaskAttemptKillEvents placed on the event loop queue may not have been processed by the
dispatcher thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message