hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15079) ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after OutputCommitter patch
Date Thu, 11 Jan 2018 13:47:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16322219#comment-16322219
] 

Steve Loughran commented on HADOOP-15079:
-----------------------------------------

bq. Looks gift horse in the mouth "How did these unrelated annotation changes get in here

Because I went through the entire call chain of what rename was up to and what their policy
was, when things were unmarked their policy was reviewed & updated. As noted, it's some
s3guard translation which is what we need to wrap up (straightforward).

bq. us I think you've changed listLocatedStatus() to Mixed.

OK, I thought it was superfluous, but I will (a) trace it throough and (b) reinstate anyway.
(Done)

Also: added annotations for Once/Retry ExceptionsSwallowed, and tagged where this is done
in the dir/finishWrite calls.
I can see this becoming a maintenance PITA, but it does force us to be consistent. I wonder
if you could do some tool which
analysed transient use of invoked methods for consistency...but you'd need to look at try/catch
logic too. Hard.

bq. Do you want an {{if (outcome) }} condition around maybeCreateFakeParent()?

good idea. Done

bq. . Now that finishedWrite() does S3Guard.addAncestors() (IIRC it did not originally), maybe
we don't need this and the whole makeDirsOrdered() thing can go away. A FS with real directories
would still probably want to use that function but .. it'll always be in git history. I'm
fine with trying it out here, or doing a separate JIRA, 'sup to you.

Let's do it here, but then we'll need the extra review of that. Note that removing it reduces
the IOPs used, helping keep DDB costs down: tangible benefits

bq. Wow, good catch. Looks like that has been there since S3A was committed.

We just have to accept that sometimes, code you consider stable turns out to have bugs of
some kind or other. Like, say, CPU parts :)

This is something we can/should backport on its own, as it is adding time & cost to every
rename. Once this is in I'll do a micro-backport of the != fix, not worrying about test coverage.

bq. Your tests sharing a path instance implies the you were renaming from root dir to root
dir (the only case Path#getParent() doesn't allocate a new Path). Sound right?

Actually, I don't think we have a test for that anywhere; there's nothing in abstract contract
root directory test. Should be though, really. I just assumed that the mismatch was part of
the cause of extra delete calls --maybe it was just unrelated but while doing the due diligence
I noticed it in the debugger.

* Added a new test {{ITestS3AFileOperationCost#testCostOfRootRename()}} to see what goes on
there. Not a lot, thankfully.
* Given this was the first time we've looked at that test suite for a while, did a quick cleanup,
moving to LTU.intercept() where appropriate, and having the entire suite skipped during fault
injection; too much risk of false test failures.

Also, we don't need to skip that test with s3guard on. now know things are fixed. 


w.r.t {{makeDirsOrdered}} in innerMkDirs: I've cut it, everything seems good.

But I've left the method in S3Guard for now, because it discusses issues there which are interesting.
Maybe 

h3. Checkstyle
Fixed important, cut comment too long, and also moved innerRename to maybeAddTrailingSlash
where appropriate...that keeps the line count down there. 

h3. Tests

Tested {{ITestS3AFileOperationCost}} the  with s3guard, s3guard + auth, s3guard + dynamo,
s3guard + dynamo + auth. The set.

Did full suite test run with: {{-Dparallel-tests -DtestsThreadCount=12 -Ds3guard -Dauth -Dscale}}

I did *not* see the problems that Sean saw, which I can't even point to a cause of, especially
the {{AbstractS3GuardToolTestBase.run:95 ยป IllegalArgument}} cases. 

Sean, can you do a clean build & test & stick up stack traces? 

> ITestS3AFileOperationCost#testFakeDirectoryDeletion failing after OutputCommitter patch
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-15079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15079
>             Project: Hadoop Common
>          Issue Type: Sub-task
>    Affects Versions: 3.1.0
>            Reporter: Sean Mackrory
>            Assignee: Steve Loughran
>            Priority: Critical
>         Attachments: HADOOP-15079-001.patch, HADOOP-15079-002.patch
>
>
> I see this test failing with "object_delete_requests expected:<1> but was:<2>".
I printed stack traces whenever this metric was incremented, and found the root cause to be
that innerMkdirs is now causing two calls to delete fake directories when it previously caused
only one. It is called once inside createFakeDirectory, and once directly inside innerMkdirs
later:
> {code}
>         at org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369)
>         at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
>         at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:279)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:1366)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:1625)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:2634)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:2599)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.putObjectDirect(S3AFileSystem.java:1498)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$createEmptyObject$11(S3AFileSystem.java:2684)
>         at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:108)
>         at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:259)
>         at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
>         at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:255)
>         at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:230)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.createEmptyObject(S3AFileSystem.java:2682)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.createFakeDirectory(S3AFileSystem.java:2657)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.innerMkdirs(S3AFileSystem.java:2021)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:1956)
>         at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2305)
>         at org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338)
>         at org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:209)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>         at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>         at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>         at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>         at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>         at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>         at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>         at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74
> {code}
> {code}
>         at org.apache.hadoop.fs.s3a.S3AInstrumentation.incrementCounter(S3AInstrumentation.java:454)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.incrementStatistic(S3AFileSystem.java:1108)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObjects$8(S3AFileSystem.java:1369)
>         at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:313)
>         at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:279)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:1366)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:1625)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:2634)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.innerMkdirs(S3AFileSystem.java:2025)
>         at org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:1956)
>         at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2305)
>         at org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338)
>         at org.apache.hadoop.fs.s3a.ITestS3AFileOperationCost.testFakeDirectoryDeletion(ITestS3AFileOperationCost.java:209)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>         at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>         at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>         at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>         at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>         at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>         at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>         at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message