hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antal Bálint Steinbach (JIRA) <j...@apache.org>
Subject [jira] [Commented] (YARN-8775) TestDiskFailures.testLocalDirsFailures sometimes can fail on concurrent File modifications
Date Thu, 11 Oct 2018 13:56:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646449#comment-16646449
] 

Antal Bálint Steinbach commented on YARN-8775:
----------------------------------------------

Hi [~haibochen] ,

Thanks for the review.

The problem is that if I disable periodical health check this will be always true:
{code:java}
public boolean areDisksHealthy() {
 if (!isDiskHealthCheckerEnabled) {
 return true;
 }
...
}{code}
And we will lose this part of the test:
{code:java}
Assert.assertEquals("Node's health in terms of disks is wrong",
isHealthy, dirsHandler.areDisksHealthy());{code}
The reason for the retry is that it is possible that the folder check is happening during
the prepareDirToFail, so it can fail in that case. Obviously, your idea fixes these kinds
of concurrency issues, what makes this test so fragile.

> TestDiskFailures.testLocalDirsFailures sometimes can fail on concurrent File modifications
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-8775
>                 URL: https://issues.apache.org/jira/browse/YARN-8775
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: test, yarn
>    Affects Versions: 3.0.0
>            Reporter: Antal Bálint Steinbach
>            Assignee: Antal Bálint Steinbach
>            Priority: Major
>         Attachments: YARN-8775.001.patch, YARN-8775.002.patch
>
>
> The test can fail sometimes when file operations were done during the check done by the
thread in _LocalDirsHandlerService._
> {code:java}
> java.lang.AssertionError: NodeManager could not identify disk failure.
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.assertTrue(Assert.java:41)
> 	at org.apache.hadoop.yarn.server.TestDiskFailures.verifyDisksHealth(TestDiskFailures.java:239)
> 	at org.apache.hadoop.yarn.server.TestDiskFailures.testDirsFailures(TestDiskFailures.java:202)
> 	at org.apache.hadoop.yarn.server.TestDiskFailures.testLocalDirsFailures(TestDiskFailures.java:99)
> Stderr
> 2018-09-13 08:21:49,822 INFO [main] server.TestDiskFailures (TestDiskFailures.java:prepareDirToFail(277))
- Prepared /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1
to fail.
> 2018-09-13 08:21:49,823 INFO [main] server.TestDiskFailures (TestDiskFailures.java:prepareDirToFail(277))
- Prepared /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
to fail.
> 2018-09-13 08:21:49,823 WARN [DiskHealthMonitor-Timer] nodemanager.DirectoryCollection
(DirectoryCollection.java:checkDirs(283)) - Directory /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1
error, Not a directory: /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1,
removing from list of valid directories
> 2018-09-13 08:21:49,824 WARN [DiskHealthMonitor-Timer] localizer.ResourceLocalizationService
(ResourceLocalizationService.java:initializeLogDir(1329)) - Could not initialize log dir /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
> java.io.FileNotFoundException: Destination exists and is not a directory: /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
> at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:515)
> at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:496)
> at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1081)
> at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:178)
> at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:205)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:747)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:743)
> at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:743)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.initializeLogDir(ResourceLocalizationService.java:1324)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.initializeLogDirs(ResourceLocalizationService.java:1318)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$000(ResourceLocalizationService.java:141)
> at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$2.onDirsChanged(ResourceLocalizationService.java:269)
> at org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:317)
> at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:452)
> at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$500(LocalDirsHandlerService.java:52)
> at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.run(LocalDirsHandlerService.java:166)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> 2018-09-13 08:21:59,824 INFO [main] server.TestDiskFailures (TestDiskFailures.java:verifyDisksHealth(237))
- ExpectedDirs=
> 2018-09-13 08:21:59,825 INFO [main] server.TestDiskFailures (TestDiskFailures.java:verifyDisksHealth(238))
- SeenDirs=/tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message