hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14457) create() does not notify metadataStore of parent directories or ensure they're not existing files
Date Fri, 09 Jun 2017 23:29:18 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045212#comment-16045212
] 

Aaron Fabbri commented on HADOOP-14457:
---------------------------------------

Thank you for updating your patch [~mackrorysd].  The v9 code looks good.

I would +1 this except one concern, which I should have mentioned earlier had it occurred
to me:  This will likely have a negative performance impact for S3Guard w/ Dynamo.  Correct
me if I am wrong, but the main purpose of this code is to fix the fact that S3A's "broken
but fast" create() implementation breaks authoritative (fully-cached) directory listings for
the MetadataStore (since the S3A client is not reporting directory creations which impact
said authoritative listings of ancestors).

In terms of performance with the DynamoDBMetadataStore, however, this code is bad for two
reasons:
1. DynamoDBMetadataStore doesn't implement authoritative listings.
2. DynamoDBMetadataStore already populates ancestors due to internal implementation details.

I do think authoritative listing is valuable though.  Not only for future performance gains
we can get by short-circuiting S3 list, but for the extra testing and logic checks we get
from having the LocalMetadataStore and associated contract tests around it.

I wonder if this is the time to introduce a capabilities query interface on MetadataStore.
 Then we could rename the function to {{S3Guard#addAncestorsIfAuthoritative(..)}} and have
it look like this:

{code}
  /** Add ancestors of qualifiedPath to MetadataStore iff it supports authoritative listings.*/
  public static void addAncestorsIfAuthoritative(MetadataStore metadataStore,
      Path qualifiedPath, String username) throws IOException {
    if (! metadataStore.getOption(SUPPORTS_AUTHORITATIVE_DIRS)) {
      return;
    }
  ...
{code}

I also like the capabilities query idea because it allows us to write stricter MetadataStore
contract tests.

Thanks for your patience on the back and forth on this.  I'd be happy to +1 this and follow
up quickly with a patch that adds the capability stuff + the new if() condition above, if
you and [~stevel@apache.org] or [~liuml07] agree with my comments here.





> create() does not notify metadataStore of parent directories or ensure they're not existing
files
> -------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-14457
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14457
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>         Attachments: HADOOP-14457-HADOOP-13345.001.patch, HADOOP-14457-HADOOP-13345.002.patch,
HADOOP-14457-HADOOP-13345.003.patch, HADOOP-14457-HADOOP-13345.004.patch, HADOOP-14457-HADOOP-13345.005.patch,
HADOOP-14457-HADOOP-13345.006.patch, HADOOP-14457-HADOOP-13345.007.patch, HADOOP-14457-HADOOP-13345.008.patch,
HADOOP-14457-HADOOP-13345.009.patch
>
>
> Not a great test yet, but it at least reliably demonstrates the issue. LocalMetadataStore
will sometimes erroneously report that a directory is empty with isAuthoritative = true when
it *definitely* has children the metadatastore should know about. It doesn't appear to happen
if the children are just directory. The fact that it's returning an empty listing is concerning,
but the fact that it says it's authoritative *might* be a second bug.
> {code}
> diff --git a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> index 78b3970..1821d19 100644
> --- a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> +++ b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> @@ -965,7 +965,7 @@ public boolean hasMetadataStore() {
>    }
>  
>    @VisibleForTesting
> -  MetadataStore getMetadataStore() {
> +  public MetadataStore getMetadataStore() {
>      return metadataStore;
>    }
>  
> diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> index 4339649..881bdc9 100644
> --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> @@ -23,6 +23,11 @@
>  import org.apache.hadoop.fs.contract.AbstractFSContract;
>  import org.apache.hadoop.fs.FileSystem;
>  import org.apache.hadoop.fs.Path;
> +import org.apache.hadoop.fs.s3a.S3AFileSystem;
> +import org.apache.hadoop.fs.s3a.Tristate;
> +import org.apache.hadoop.fs.s3a.s3guard.DirListingMetadata;
> +import org.apache.hadoop.fs.s3a.s3guard.MetadataStore;
> +import org.junit.Test;
>  
>  import static org.apache.hadoop.fs.contract.ContractTestUtils.dataset;
>  import static org.apache.hadoop.fs.contract.ContractTestUtils.writeDataset;
> @@ -72,4 +77,24 @@ public void testRenameDirIntoExistingDir() throws Throwable {
>      boolean rename = fs.rename(srcDir, destDir);
>      assertFalse("s3a doesn't support rename to non-empty directory", rename);
>    }
> +
> +  @Test
> +  public void testMkdirPopulatesFileAncestors() throws Exception {
> +    final FileSystem fs = getFileSystem();
> +    final MetadataStore ms = ((S3AFileSystem) fs).getMetadataStore();
> +    final Path parent = path("testMkdirPopulatesFileAncestors/source");
> +    try {
> +      fs.mkdirs(parent);
> +      final Path nestedFile = new Path(parent, "dir1/dir2/dir3/file4");
> +      byte[] srcDataset = dataset(256, 'a', 'z');
> +      writeDataset(fs, nestedFile, srcDataset, srcDataset.length,
> +          1024, false);
> +
> +      DirListingMetadata list = ms.listChildren(parent);
> +      assertTrue("MetadataStore falsely reports authoritative empty list",
> +          list.isEmpty() == Tristate.FALSE || !list.isAuthoritative());
> +    } finally {
> +      fs.delete(parent, true);
> +    }
> +  }
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message