hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6479) TestDistributedShell.testDSShellWithoutDomainV1_5 fails
Date Wed, 30 Aug 2017 15:09:03 GMT

    [ https://issues.apache.org/jira/browse/YARN-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147402#comment-16147402

Jason Lowe commented on YARN-6479:

I saw another case of this, and it fails because EntityGroupFSTimelineStore can move files
from the active to the done directory before the application has completed writing the data.
 This can occur because the application is in the FINISHING state which clients read as the
FINISHED state in an app report.  EntityGroupFSTimelineStore sees the application has finished
from the app report and assumes the entity files are done being written when in fact the app
is in the FINISHING state and the AM is still busy writing out the entities to HDFS.

The good news is that data isn't lost since HDFS supports renaming of files being actively
written, but it can cause this unit test to fail since the test assumes files in the done
directory are complete.  Either we need to fix the test to account for this race or we need
to fix EntityGroupFSTimelineStore so it does not try to move files for applications that are
still active.  Fixing the latter requires changing the EntityGroupFSTimelineStore to get an
additional app attempt report on the current attempt and see if it is in a terminal state
(i.e.: FINISHED, FAILED, KILLED and not FINISHING).  If so then this app is really still actively
writing entity files and it should not move the files from active to done.

> TestDistributedShell.testDSShellWithoutDomainV1_5 fails
> -------------------------------------------------------
>                 Key: YARN-6479
>                 URL: https://issues.apache.org/jira/browse/YARN-6479
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Eric Badger
> {noformat}
> java.lang.AssertionError: expected:<2> but was:<0>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 	at org.junit.Assert.assertEquals(Assert.java:118)
> 	at org.junit.Assert.assertEquals(Assert.java:555)
> 	at org.junit.Assert.assertEquals(Assert.java:542)
> 	at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:385)
> 	at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV1_5(TestDistributedShell.java:236)
> {noformat}
> This particular run was in 2.8, but may also be present through trunk. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message