hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13164) Optimize S3AFileSystem::deleteUnnecessaryFakeDirectories
Date Wed, 28 Sep 2016 11:38:20 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Steve Loughran updated HADOOP-13164:
    Attachment: HADOOP-13164-branch-005.patch

# using {{ArrayList<>}} for list; no need to go near guava if we can help it.
# tuned loop to not be a while true + break on root, but just be a {{while(!isRootPath(f))}}
# moved check for an empty list into {{removeKeys()}}, so it is everywhere.
# cut test out of {{ITestS3AContractRename}}. This test is *only* for testing compliance with
the FS spec, and were it not for consistency problems surfacing, should have no test cases
at all, just a materialized subclass of {{AbstractContractRenameTest}}
# moved this test in {{ITestS3AFileOperationCost.testFakeDirectoryDeletion then added assertions
about metric state during all the operations.
# cut test case from {{ITestS3ADirectoryPerformance}} as it didn't contain a single assertion.
This is not a test; it is a demo.

I really want to call out the test changes. A key benefit of adding all those metrics is that
it lets us make assertions about the no of requests made of S3, *independent of execution
time*. The tests in patch 004 don't really check *anything*. They log stuff, but that's for
humans to read, not anything which can be automated in a jenkins build.

* the initial purpose of tests is to demonstrate that the code not only behaves as expected,
but doesn't break when you try to break it
* the secondary purpose of the test is to catch regressions: to make sure your work holds
even in the presence of changes by others (including your future self).
* all tests need to be fully automated: the results must be verifiable in Jenkins, rather
than reviewed in the logs.

This is [the style guide|https://github.com/steveloughran/formality/blob/master/styleguide/styleguide.md]
I try to follow when coding tests, and what I like to see in reviews. Yes, it usually makes
writing the tests harder than the actual production code. Which is good: production code is
ideally simple and reliable. Tests are are designed to break that code and use assertions
to show that they have broken it. This takes effort —but is necessary to keep and improve
code quality.

> Optimize S3AFileSystem::deleteUnnecessaryFakeDirectories
> --------------------------------------------------------
>                 Key: HADOOP-13164
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13164
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>         Attachments: HADOOP-13164-branch-005.patch, HADOOP-13164-branch-2-003.patch,
HADOOP-13164-branch-2-004.patch, HADOOP-13164.branch-2-002.patch, HADOOP-13164.branch-2.WIP.002.patch,
> https://github.com/apache/hadoop/blob/27c4e90efce04e1b1302f668b5eb22412e00d033/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L1224
> deleteUnnecessaryFakeDirectories is invoked in S3AFileSystem during rename and on outputstream
close() to purge any fake directories. Depending on the nesting in the folder structure, it
might take a lot longer time as it invokes getFileStatus multiple times.  Instead, it should
be able to break out of the loop once a non-empty directory is encountered. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message