hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-7048) AM can still crash after MAPREDUCE-7020
Date Thu, 08 Feb 2018 17:11:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357235#comment-16357235
] 

Jason Lowe commented on MAPREDUCE-7048:
---------------------------------------

Thanks for updating the patch!

Now that we're looking up uberized all the time, I think it makes sense to just do this once
when the task is configured (i.e.: make it a field that is initialized in the setConf method).
 Then we don't have to do conf key lookups every time we do a status update.

Rather than mess with the security manager it would be simpler to change the System.exit calls
to use ExitUtil.terminate.  Task is already doing this in another place already, and arguably
it should be consistent.  Then the test for non-uber mode can be just as simple as the uber
test by making sure ExitUtil.systemExitDisabled is called and adding {{expected=ExitException.class}}
to the Test annotation.


> AM can still crash after MAPREDUCE-7020
> ---------------------------------------
>
>                 Key: MAPREDUCE-7048
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7048
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am
>    Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>         Attachments: MAPREDUCE-7048-001.patch, MAPREDUCE-7048-002.patch
>
>
> The testcase TestUberAM#testThreadDumpOnTaskTimeout was supposed to be fixed by MAPREDUCE-7020.
However, it still fails, see: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7325/testReport/junit/org.apache.hadoop.mapreduce.v2/TestMRJobs/testThreadDumpOnTaskTimeout/
(note: other tests failed as well, but those look unrelated).
> When I tried to reproduce it locally, it failed again, although with a slightly different
error message (it was actually the same as before):
> {noformat}
> [INFO] -------------------------------------------------------
> [INFO]  T E S T S
> [INFO] -------------------------------------------------------
> [INFO] Running org.apache.hadoop.mapreduce.v2.TestUberAM
> [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 128.192 s <<<
FAILURE! - in org.apache.hadoop.mapreduce.v2.TestUberAM
> [ERROR] testThreadDumpOnTaskTimeout(org.apache.hadoop.mapreduce.v2.TestUberAM)  Time
elapsed: 79.539 s  <<< FAILURE!
> java.lang.AssertionError: No AppMaster log found! expected:<1> but was:<2>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 	at org.junit.Assert.assertEquals(Assert.java:118)
> 	at org.junit.Assert.assertEquals(Assert.java:555)
> 	at org.apache.hadoop.mapreduce.v2.TestMRJobs.testThreadDumpOnTaskTimeout(TestMRJobs.java:1228)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}
> *Root cause:* {{System.exit()}} is still invoked at {{Task.statusUpdate()}}
> {noformat}
>   public void statusUpdate(TaskUmbilicalProtocol umbilical) 
>   throws IOException {
>     int retries = MAX_RETRIES;
>     while (true) {
>       try {
>         if (!umbilical.statusUpdate(getTaskID(), taskStatus).getTaskFound()) {
>           LOG.warn("Parent died.  Exiting "+taskId);
>           System.exit(66);
>         }
>         taskStatus.clearStatus();
>         return;
>         ...
> {noformat}
> At this point, the task was not found and return value of {{umbilical.statusUpdate()}}
is false. Checking whether we run in uber mode seems to solve the problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message