hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-14457) LocalMetadataStore falsely reports empty parent directory authoritatively
Date Fri, 26 May 2017 19:55:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16026745#comment-16026745
] 

Aaron Fabbri edited comment on HADOOP-14457 at 5/26/17 7:54 PM:
----------------------------------------------------------------

First, I'm happy testing authoritative mode w/ LocalMetadataStore exposed a logic bug in the
FS client.  Very cool.

Yep [~mackrorysd] the main bug here seems to be that S3A does not report the directories it
implicitly creates to the metadatastore.  We could change MetadataStore to have implicit ancestor
creation semantics like {{FileSystem#create()}}, but I favor the current semantics, mostly
because the FS client already often has to enumerate ancestors (e.g. create and mkdir), and
doing it again inside MetadataStore seems wasteful (especially since these will be network
round trips, and it sometimes will not be necessary).

In your test code from JIRA description:
{noformat}
+      assertTrue("MetadataStore falsely reports authoritative empty list",
+          list.isEmpty() == Tristate.FALSE || !list.isAuthoritative());
{noformat}
The check of {{!list.isAuthoritative()}} is redundant.  {{FALSE}} means we know it is empty.
 If you look at {{DirListingMetadata.isEmpty()}} you'll see it is already considered there.

In your patch:
{noformat}
+    List<Path> metadataStoreDirs = null;
+    if (hasMetadataStore()) {
+      metadataStoreDirs = new ArrayList<>();
+    }
+    Path parent = qualify(f.getParent());
+    while (!parent.isRoot()) {
+      DirectoryStatus directory = checkPathForDirectory(parent);
+      if(directory == DirectoryStatus.EXISTS_AND_IS_FILE) {
+        throw new FileAlreadyExistsException("Parent directory " + parent +
+            " already exists as a file");
+      } else if (directory == DirectoryStatus.DOES_NOT_EXIST ||
+          directory == DirectoryStatus.EXISTS_AND_IS_DIRECTORY_ON_S3_ONLY) {
+        if (metadataStoreDirs != null) {
+          metadataStoreDirs.add(parent);
+        }
+      } else if (directory ==
+          DirectoryStatus.EXISTS_AND_IS_DIRECTORY_ON_METADATASTORE) {
+        break;
+      }
+      parent = parent.getParent();
+    }
+    S3Guard.makeDirsOrdered(metadataStore, metadataStoreDirs, username, false);
{noformat}

I'm wondering, both here and in your recent mkdirs() change, if we cannot simplify this and
just use getFileStatus() instead of your checkPathForDirectory()




was (Author: fabbri):
First, I'm happy testing authoritative mode w/ LocalMetadataStore exposed a logic bug in the
FS client.  Very cool.

Yep [~mackrorysd] the main bug here seems to be that S3A does not report the directories it
implicitly creates to the metadatastore.  We could change MetadataStore to have implicit ancestor
creation semantics like {{FileSystem#create()}}, but I favor the current semantics, mostly
because the FS client already often has to enumerate ancestors (e.g. create and mkdir), and
doing it again inside MetadataStore seems wasteful (especially since these will be network
round trips, and it sometimes will not be necessary).

In your test code from JIRA description:
{quote}
+      assertTrue("MetadataStore falsely reports authoritative empty list",
+          list.isEmpty() == Tristate.FALSE || !list.isAuthoritative());
{quote}
The check of {{!list.isAuthoritative()}} is redundant.  {{FALSE}} means we know it is empty.
 If you look at {{DirListingMetadata.isEmpty()}} you'll see it is already considered there.

In your patch:
{quote}
+    List<Path> metadataStoreDirs = null;
+    if (hasMetadataStore()) {
+      metadataStoreDirs = new ArrayList<>();
+    }
+    Path parent = qualify(f.getParent());
+    while (!parent.isRoot()) {
+      DirectoryStatus directory = checkPathForDirectory(parent);
+      if(directory == DirectoryStatus.EXISTS_AND_IS_FILE) {
+        throw new FileAlreadyExistsException("Parent directory " + parent +
+            " already exists as a file");
+      } else if (directory == DirectoryStatus.DOES_NOT_EXIST ||
+          directory == DirectoryStatus.EXISTS_AND_IS_DIRECTORY_ON_S3_ONLY) {
+        if (metadataStoreDirs != null) {
+          metadataStoreDirs.add(parent);
+        }
+      } else if (directory ==
+          DirectoryStatus.EXISTS_AND_IS_DIRECTORY_ON_METADATASTORE) {
+        break;
+      }
+      parent = parent.getParent();
+    }
+    S3Guard.makeDirsOrdered(metadataStore, metadataStoreDirs, username, false);
{quote}

I'm wondering, both here and in your recent mkdirs() change, if we cannot simplify this and
just use getFileStatus() instead of your checkPathForDirectory()



> LocalMetadataStore falsely reports empty parent directory authoritatively
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-14457
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14457
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Sean Mackrory
>         Attachments: HADOOP-14457-HADOOP-13345.001.patch
>
>
> Not a great test yet, but it at least reliably demonstrates the issue. LocalMetadataStore
will sometimes erroneously report that a directory is empty with isAuthoritative = true when
it *definitely* has children the metadatastore should know about. It doesn't appear to happen
if the children are just directory. The fact that it's returning an empty listing is concerning,
but the fact that it says it's authoritative *might* be a second bug.
> {code}
> diff --git a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> index 78b3970..1821d19 100644
> --- a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> +++ b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> @@ -965,7 +965,7 @@ public boolean hasMetadataStore() {
>    }
>  
>    @VisibleForTesting
> -  MetadataStore getMetadataStore() {
> +  public MetadataStore getMetadataStore() {
>      return metadataStore;
>    }
>  
> diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> index 4339649..881bdc9 100644
> --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> @@ -23,6 +23,11 @@
>  import org.apache.hadoop.fs.contract.AbstractFSContract;
>  import org.apache.hadoop.fs.FileSystem;
>  import org.apache.hadoop.fs.Path;
> +import org.apache.hadoop.fs.s3a.S3AFileSystem;
> +import org.apache.hadoop.fs.s3a.Tristate;
> +import org.apache.hadoop.fs.s3a.s3guard.DirListingMetadata;
> +import org.apache.hadoop.fs.s3a.s3guard.MetadataStore;
> +import org.junit.Test;
>  
>  import static org.apache.hadoop.fs.contract.ContractTestUtils.dataset;
>  import static org.apache.hadoop.fs.contract.ContractTestUtils.writeDataset;
> @@ -72,4 +77,24 @@ public void testRenameDirIntoExistingDir() throws Throwable {
>      boolean rename = fs.rename(srcDir, destDir);
>      assertFalse("s3a doesn't support rename to non-empty directory", rename);
>    }
> +
> +  @Test
> +  public void testMkdirPopulatesFileAncestors() throws Exception {
> +    final FileSystem fs = getFileSystem();
> +    final MetadataStore ms = ((S3AFileSystem) fs).getMetadataStore();
> +    final Path parent = path("testMkdirPopulatesFileAncestors/source");
> +    try {
> +      fs.mkdirs(parent);
> +      final Path nestedFile = new Path(parent, "dir1/dir2/dir3/file4");
> +      byte[] srcDataset = dataset(256, 'a', 'z');
> +      writeDataset(fs, nestedFile, srcDataset, srcDataset.length,
> +          1024, false);
> +
> +      DirListingMetadata list = ms.listChildren(parent);
> +      assertTrue("MetadataStore falsely reports authoritative empty list",
> +          list.isEmpty() == Tristate.FALSE || !list.isAuthoritative());
> +    } finally {
> +      fs.delete(parent, true);
> +    }
> +  }
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message