Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C9A59200B9D for ; Wed, 28 Sep 2016 13:38:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id C8890160AB4; Wed, 28 Sep 2016 11:38:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 188BB160ADD for ; Wed, 28 Sep 2016 13:38:21 +0200 (CEST) Received: (qmail 52010 invoked by uid 500); 28 Sep 2016 11:38:21 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 51725 invoked by uid 99); 28 Sep 2016 11:38:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Sep 2016 11:38:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id C82DE2C2A65 for ; Wed, 28 Sep 2016 11:38:20 +0000 (UTC) Date: Wed, 28 Sep 2016 11:38:20 +0000 (UTC) From: "Steve Loughran (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HADOOP-13164) Optimize S3AFileSystem::deleteUnnecessaryFakeDirectories MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 28 Sep 2016 11:38:23 -0000 [ https://issues.apache.org/jira/browse/HADOOP-13164?page=3Dcom.atlass= ian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13164: ------------------------------------ Attachment: HADOOP-13164-branch-005.patch # using {{ArrayList<>}} for list; no need to go near guava if we can help i= t. # tuned loop to not be a while true + break on root, but just be a {{while(= !isRootPath(f))}} # moved check for an empty list into {{removeKeys()}}, so it is everywhere. # cut test out of {{ITestS3AContractRename}}. This test is *only* for testi= ng compliance with the FS spec, and were it not for consistency problems su= rfacing, should have no test cases at all, just a materialized subclass of = {{AbstractContractRenameTest}} # moved this test in {{ITestS3AFileOperationCost.testFakeDirectoryDeletion = then added assertions about metric state during all the operations. # cut test case from {{ITestS3ADirectoryPerformance}} as it didn't contain = a single assertion. This is not a test; it is a demo. I really want to call out the test changes. A key benefit of adding all tho= se metrics is that it lets us make assertions about the no of requests made= of S3, *independent of execution time*. The tests in patch 004 don't reall= y check *anything*. They log stuff, but that's for humans to read, not anyt= hing which can be automated in a jenkins build. * the initial purpose of tests is to demonstrate that the code not only beh= aves as expected, but doesn't break when you try to break it * the secondary purpose of the test is to catch regressions: to make sure y= our work holds even in the presence of changes by others (including your fu= ture self). * all tests need to be fully automated: the results must be verifiable in J= enkins, rather than reviewed in the logs. This is [the style guide|https://github.com/steveloughran/formality/blob/ma= ster/styleguide/styleguide.md] I try to follow when coding tests, and what = I like to see in reviews. Yes, it usually makes writing the tests harder th= an the actual production code. Which is good: production code is ideally si= mple and reliable. Tests are are designed to break that code and use assert= ions to show that they have broken it. This takes effort =E2=80=94but is ne= cessary to keep and improve code quality. > Optimize S3AFileSystem::deleteUnnecessaryFakeDirectories > -------------------------------------------------------- > > Key: HADOOP-13164 > URL: https://issues.apache.org/jira/browse/HADOOP-13164 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Affects Versions: 2.8.0 > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Priority: Minor > Attachments: HADOOP-13164-branch-005.patch, HADOOP-13164-branch-2= -003.patch, HADOOP-13164-branch-2-004.patch, HADOOP-13164.branch-2-002.patc= h, HADOOP-13164.branch-2.WIP.002.patch, HADOOP-13164.branch-2.WIP.patch > > > https://github.com/apache/hadoop/blob/27c4e90efce04e1b1302f668b5eb22412e0= 0d033/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFil= eSystem.java#L1224 > deleteUnnecessaryFakeDirectories is invoked in S3AFileSystem during renam= e and on outputstream close() to purge any fake directories. Depending on t= he nesting in the folder structure, it might take a lot longer time as it i= nvokes getFileStatus multiple times. Instead, it should be able to break o= ut of the loop once a non-empty directory is encountered.=20 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org