hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14457) create() does not notify metadataStore of parent directories or ensure they're not existing files
Date Wed, 21 Jun 2017 00:10:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056703#comment-16056703
] 

Aaron Fabbri commented on HADOOP-14457:
---------------------------------------

[~mackrorysd] and I just chatted about this and I think we came up with a better option. 
Instead of adding a "recursive" put to MetadataStore, I suggested a batched put.

The main thing I disliked about adding a new method {{MetadataStore#putRecursive(PathMetadata
p)}} is that it is not explicit as to which ancestors are created.  It is this sort of sloppy
method "I'm creating this thing, make sure the ancestors exist" instead of an explicit "here
are the paths that I'm creating".  The former makes it hard to use MetadataStore interface
for precise metadata logging.

One of the problems with requiring the FS client to put() each path it creates is that, since
dynamo has to create ancestors internally for its schema, DynamoDB ends up having to do extra
round trips (both FS and dynamo making sure ancestors are created).  [~mackrorysd] rightfully
pushed back on this.  Even with the capability check I suggested (which should be correct
but not trivial to reason about), Dynamo would be affected by extra round trips once we added
authoritative listing support (HADOOP-14154).

We both seemed to agree that adding a batched {{MetadataStore#put(Collection<PathMetadata>
      pathsToCreate)}} seemed like a better option than the capability check.  The dynamo
implementation can use existing logic from its move() implementation to pre-process the list
to include ancestors in the batched put list.  Dynamo gets optimal round trips, now and in
the future, and MetadataStore is still able to express explicit logging of metadata changes.

Thanks for great discussion on this [~mackrorysd], please add anything I missed.

> create() does not notify metadataStore of parent directories or ensure they're not existing
files
> -------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-14457
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14457
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>         Attachments: HADOOP-14457-HADOOP-13345.001.patch, HADOOP-14457-HADOOP-13345.002.patch,
HADOOP-14457-HADOOP-13345.003.patch, HADOOP-14457-HADOOP-13345.004.patch, HADOOP-14457-HADOOP-13345.005.patch,
HADOOP-14457-HADOOP-13345.006.patch, HADOOP-14457-HADOOP-13345.007.patch, HADOOP-14457-HADOOP-13345.008.patch,
HADOOP-14457-HADOOP-13345.009.patch
>
>
> Not a great test yet, but it at least reliably demonstrates the issue. LocalMetadataStore
will sometimes erroneously report that a directory is empty with isAuthoritative = true when
it *definitely* has children the metadatastore should know about. It doesn't appear to happen
if the children are just directory. The fact that it's returning an empty listing is concerning,
but the fact that it says it's authoritative *might* be a second bug.
> {code}
> diff --git a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> index 78b3970..1821d19 100644
> --- a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> +++ b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> @@ -965,7 +965,7 @@ public boolean hasMetadataStore() {
>    }
>  
>    @VisibleForTesting
> -  MetadataStore getMetadataStore() {
> +  public MetadataStore getMetadataStore() {
>      return metadataStore;
>    }
>  
> diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> index 4339649..881bdc9 100644
> --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> @@ -23,6 +23,11 @@
>  import org.apache.hadoop.fs.contract.AbstractFSContract;
>  import org.apache.hadoop.fs.FileSystem;
>  import org.apache.hadoop.fs.Path;
> +import org.apache.hadoop.fs.s3a.S3AFileSystem;
> +import org.apache.hadoop.fs.s3a.Tristate;
> +import org.apache.hadoop.fs.s3a.s3guard.DirListingMetadata;
> +import org.apache.hadoop.fs.s3a.s3guard.MetadataStore;
> +import org.junit.Test;
>  
>  import static org.apache.hadoop.fs.contract.ContractTestUtils.dataset;
>  import static org.apache.hadoop.fs.contract.ContractTestUtils.writeDataset;
> @@ -72,4 +77,24 @@ public void testRenameDirIntoExistingDir() throws Throwable {
>      boolean rename = fs.rename(srcDir, destDir);
>      assertFalse("s3a doesn't support rename to non-empty directory", rename);
>    }
> +
> +  @Test
> +  public void testMkdirPopulatesFileAncestors() throws Exception {
> +    final FileSystem fs = getFileSystem();
> +    final MetadataStore ms = ((S3AFileSystem) fs).getMetadataStore();
> +    final Path parent = path("testMkdirPopulatesFileAncestors/source");
> +    try {
> +      fs.mkdirs(parent);
> +      final Path nestedFile = new Path(parent, "dir1/dir2/dir3/file4");
> +      byte[] srcDataset = dataset(256, 'a', 'z');
> +      writeDataset(fs, nestedFile, srcDataset, srcDataset.length,
> +          1024, false);
> +
> +      DirListingMetadata list = ms.listChildren(parent);
> +      assertTrue("MetadataStore falsely reports authoritative empty list",
> +          list.isEmpty() == Tristate.FALSE || !list.isAuthoritative());
> +    } finally {
> +      fs.delete(parent, true);
> +    }
> +  }
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message