hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Antal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
Date Wed, 19 Dec 2018 16:41:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725168#comment-16725168
] 

Adam Antal commented on HADOOP-15819:
-------------------------------------

Not exactly, to be more clear though I enumerate the events when calling any subtest of {{AbstractITCommitProtocol}}:
* During the very first moment of the startup {{setup()}} and {{init()}} functions are called.
Concretely {{AbstractBondedFSContract}}  creates and stores a new {{FileSystem}} instance
of the proper type.
** Later, each time the getFileSystem() is called, it returns this "cached" instance (This
is NOT the FSCache! just a private data member of {{AbstractBondedFSContract}}).
** Note: since {{AbstractITCommitProtocol.createConfiguration()}} contains {{disableFilesystemCaching(conf)}},
the cache is disabled. Each time this Configuration object is used anywhere in the test, it
returns a newly created instance by the static {{FileSystem.get()}} method.
* During this startup in {{AbstractITCommitProtocol.setup()}} the line {{bindFileSystem(fileSystem,
outDir, fileSystem.getConf());}} is executed as well. We call the internal {{getFileSystem()}},
which returns the already created and stored instance, and inject it directly into the FSCache.
* The test runs using the stored instance and the non-cached instances of FileSystem.
* During teardown we close the stored FileSystem object as usual, BUT since we didn't put
this instance into the FSCache the regular way it got stuck in the cache. Although the {{close()}}
contains this call: {{CACHE.remove(this.key, this);}}, the {{this.key}} (so the FileSystem's
FileSystem.Cache private member) is not set. It would have been set, if it had been put to
the cache in the regular way, but not by injection. More concretely {{Cache.getInternal}}
sets this.key of the FileSystem object, which is surpassed when the FS instance is manually
injected.
* Later, in the same JVM another test is setting up using the {{s3a://}} scheme, this object
gets retrieved from the cache (since it's not removed), but now it's closed, and boom.

I'd rather not mess with manual injection, that's why I proposed to delete the binding.

I wasn't 100% right with my previous comment (you can practically ignore that), but I think
this is correct way how things work. Still I can make mistakes, so please review my thoughts
on this.

> S3A integration test failures: FileSystem is closed! - without parallel test run
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-15819
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15819
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.1.1
>            Reporter: Gabor Bota
>            Assignee: Adam Antal
>            Priority: Critical
>         Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, HADOOP-15819.002.patch,
S3ACloseEnforcedFileSystem.java, S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip
>
>
> Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against Amazon AWS
S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these failures:
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 s <<<
FAILURE! - in org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob)
 Time elapsed: 0.027 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 s <<<
FAILURE! - in org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob
> [ERROR] testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
 Time elapsed: 0.021 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
 Time elapsed: 0.022 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 s <<<
FAILURE! - in org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest)
 Time elapsed: 0.023 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 s <<<
FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob)  Time elapsed:
0.039 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 s <<<
FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
> [ERROR] testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory)  Time
elapsed: 0.014 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> {noformat}
> The big issue is that the tests are running in a serial manner - no test is running on
top of the other - so we should not see that the tests are failing like this. The issue could
be in how we handle org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same
S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B test will get
the same FileSystem object from the cache and try to use it, but it is closed.
> We see this a lot in our downstream testing too. It's not possible to tell that the failed
regression test result is an implementation issue in the runtime code or a test implementation
problem. 
> I've checked when and what closes the S3AFileSystem with a sightly modified version of
S3AFileSystem which logs the closers of the fs if an error should occur. I'll attach this
modified java file for reference. See the next example of the result when it's running:
> {noformat}
> 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem (S3ACloseEnforcedFileSystem.java:checkIfClosed(74))
- Use after close(): 
> java.lang.RuntimeException: Using closed FS!.
> 	at org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73)
> 	at org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474)
> 	at org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338)
> 	at org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193)
> 	at org.apache.hadoop.fs.s3a.ITestS3AClosedFS.setup(ITestS3AClosedFS.java:40)
> 	at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
> 	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> 	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> 2018-10-04 00:52:25,596 [Thread-4201] WARN  s3a.S3ACloseEnforcedFileSystem (S3ACloseEnforcedFileSystem.java:logClosers(61))
- FS was closed 6 times by:
> 2018-10-04 00:52:25,596 [Thread-4201] WARN  s3a.S3ACloseEnforcedFileSystem (S3ACloseEnforcedFileSystem.java:logClosers(64))
- ----------------- close() #1 ---------------
> 2018-10-04 00:52:25,596 [Thread-4201] WARN  s3a.S3ACloseEnforcedFileSystem (S3ACloseEnforcedFileSystem.java:logClosers(65))
- java.lang.RuntimeException: close() called here
> 	at org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.close(S3ACloseEnforcedFileSystem.java:83)
> 	at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> 	at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> 	at org.apache.hadoop.fs.s3a.AbstractS3ATestBase.teardown(AbstractS3ATestBase.java:55)
> 	at org.apache.hadoop.fs.s3a.commit.AbstractCommitITest.teardown(AbstractCommitITest.java:206)
> 	at org.apache.hadoop.fs.s3a.auth.ITestAssumedRoleCommitOperations.teardown(ITestAssumedRoleCommitOperations.java:90)
> 	at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> 	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
> 	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> 	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> 2018-10-04 00:52:25,596 [Thread-4201] WARN  s3a.S3ACloseEnforcedFileSystem (S3ACloseEnforcedFileSystem.java:logClosers(64))
- ----------------- close() #2 ---------------
> 2018-10-04 00:52:25,596 [Thread-4201] WARN  s3a.S3ACloseEnforcedFileSystem (S3ACloseEnforcedFileSystem.java:logClosers(65))
- java.lang.RuntimeException: close() called here
> 	at org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.close(S3ACloseEnforcedFileSystem.java:83)
> 	at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> 	at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> 	at org.apache.hadoop.fs.s3a.AbstractS3ATestBase.teardown(AbstractS3ATestBase.java:55)
> 	at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> 	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
> 	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> 	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> 2018-10-04 00:52:25,597 [Thread-4201] WARN  s3a.S3ACloseEnforcedFileSystem (S3ACloseEnforcedFileSystem.java:logClosers(64))
- ----------------- close() #3 ---------------
> 2018-10-04 00:52:25,598 [Thread-4201] WARN  s3a.S3ACloseEnforcedFileSystem (S3ACloseEnforcedFileSystem.java:logClosers(65))
- java.lang.RuntimeException: close() called here
> 	at org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.close(S3ACloseEnforcedFileSystem.java:83)
> 	at org.apache.hadoop.io.IOUtils.cleanupWithLogger(IOUtils.java:280)
> 	at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:298)
> 	at org.apache.hadoop.fs.s3a.AbstractS3ATestBase.teardown(AbstractS3ATestBase.java:55)
> 	at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> 	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
> 	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
> 	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> (...)
> {noformat}
> From the logs it seems that the closers are the teardown methods of the abstract test
classes, so the same FileSystem object gets reused between tests.
> To run a test with the modified fs the following should be in the config:
> {noformat}
> <property>
>   <name>fs.s3a.impl</name>
>   <value>org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem</value>
>   <description>The implementation class of the S3A Filesystem</description>
> </property>
> {noformat}
> I file this jira to solve this issue, and start a conversation if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message