hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13760) S3Guard: add delete tracking
Date Tue, 23 May 2017 00:15:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020445#comment-16020445

Aaron Fabbri commented on HADOOP-13760:

Thanks for the great work on this feature [~mackrorysd].  I still have a couple of things
I need to review more carefully (e.g. your test cases, MetadataStoreListFilesIterator) but
here is a batch of comments to start on.  Most of my comments are style / design / clarity
things.  Most of the code looks correct.

- Changes to InconsistentAmazonS3Client.  I still need to re-review this part.

@@ -668,6 +681,76 @@ public LocatedFileStatus next() throws IOException {

+   * Wraps another iterator and filters out files that appear in the provided
+   * set of tombstones.  Will read ahead in the iterator when necessary to
+   * ensure that emptiness is detected early enough if only deleted objects
+   * remain in the source iterator.
+   */
+  static class TombstoneReconcilingIterator implements
+      RemoteIterator<LocatedFileStatus> {

Unit test for this Iterator?  It looks pretty straightforward, but unit
test would be ideal.

+    public TombstoneReconcilingIterator(RemoteIterator<LocatedFileStatus> iterator,
+                                        Set<Path> tombstones) {
+      this.iterator = iterator;
+      if (tombstones != null) {
+        this.tombstones = tombstones;
+      } else {
+        this.tombstones = new HashSet<>();


+   * Accept all entries except those which map to S3N pseudo directory markers.
+   */
+  static class AcceptAllButS3nDirs extends AcceptAllButSelfAndS3nDirs {
+    public AcceptAllButS3nDirs(Path qualifiedPath) {
+      super(qualifiedPath);
+    }
+    @Override
+    public boolean accept(Path keyPath, S3ObjectSummary summary) {
+      return !summary.getKey().endsWith(S3N_FOLDER_SUFFIX);
+    }

What about the two other accept() variants?

- innerRename() changes.. My diff-diff says this code is about the same from last patch that
we reviewed together.  Sound right?  If so skipping for now.


       DirListingMetadata dirMeta = metadataStore.listChildren(path);
       if (allowAuthoritative && dirMeta != null && dirMeta.isAuthoritative())
-        return S3Guard.dirMetaToStatuses(dirMeta);
+        return S3Guard.dirMetaToStatuses(dirMeta.withoutTombstones());
Would be more (garbage) efficient to let dirMetaToStatuses() handle the tombstone

- Refactor of s3GetFileStatus() changes to use isEmpty*() helper functions.  Looks good. 
Code is more readable for sure.

+  private boolean isEmptyOfKeys(Collection<String> paths, Set<Path>
+      tombstones) {
+    if (tombstones == null) {
+      return paths.isEmpty();
+    }
+    for (String path : paths) {
+      Path absolute = qualify(new Path("/" + path));

 keyToQualifiedPath() here?

-      queue.addAll(metadataStore.listChildren(path).getListing());
+      DirListingMetadata meta = metadataStore.listChildren(path);
+      if (meta != null) {
+        Collection<PathMetadata> more = meta.withoutTombstones().getListing();
+        if (!more.isEmpty()) {
+          queue.addAll(more);
+        }

Just a note: Old code looks like it could NPE since listChildren() is nullable.

+  public boolean isDeleted() {
+    return isDeleted;
+  }

(In DirListingMetadata) Is this needed?  DirListingMetadata is analogous to listStatus in
that it 
gives state for the stuff *in* a directory, but not for the directory itself.  My feeling
is this should be omitted, but I could be convinced otherwise.

+  /** Replace an entry with a tombstone.
+   *
+   * @param childPath path of entry to replace.
+   */
Nit. Move first comment sentence down a line.

+  public void delete(Path childPath) {
+    checkChildPath(childPath);
+    listMap.put(childPath, PathMetadata.tombstone(childPath));
+  }
we now have remove() and delete() here.. I'm concerned folks will confuse
them.  Can we rename delete() to markDeleted()?

+  @Override
+  public void deleteMetadata(Path path) throws IOException {
+    innerDelete(path, false);
+  }

"clearMetadata()" or "forgetMetadata()"?   Superclass (interface) should note this is exposed
for testing, assuming we want to keep it.  We could, instead of adding this new public function,
extend {{MetadataStore#destroy()}} and add a boolean "deleteInstances".  When false, destroy()
would clear all data, but leave resources such as tables behind.  When true, underlying resources
such as dynamo tables would be deleted.  The actual meaning of true/false would be implementation-specific.
 This is a hint: implementations may ignore the boolean flag.

Tests could make liberal use of destroy(false) to clean up after test runs.  The more targeted
delete here is nice for finer-grained testing though.

+  private Item getConsistentItem(PrimaryKey key) {
+    final GetItemSpec spec = new GetItemSpec()
+        .withPrimaryKey(key)
+        .withConsistentRead(true); // strictly consistent read
+    return table.getItem(spec);
+  }

 Do you want to use this in fullPathsToPut() as well?

       LOG.trace("Listing table {} in region {} for {} returning {}",
           tableName, region, path, metas);

-      return (metas.isEmpty() && get(path) == null)
+      PathMetadata dir;
+      return (metas.isEmpty() && ((dir = get(path)) == null || dir.isDeleted()))

 Not sure we need this change.. See previous comment on DirListingMetadata#isDeleted()

@@ -207,6 +212,9 @@ public static S3AFileStatus putAndReturn(MetadataStore ms,

     boolean changed = false;
     for (FileStatus s : backingStatuses) {
+      if (deleted.contains(s.getPath())) {
+        continue;
+      }

I think there is still a comment you can remove above this "// HADOOP-13760: filter out deleted

+   * {@link MetadataStore#delete(Path)}.
+   *
+   * @param path the path to delete
+   * @throws IOException if there is an error
+   */
+  void deleteMetadata(Path path) throws IOException;
Annotation or comment that this is for testing only?  I wonder if, instead
of this extra function in the public interface, we should just use destroy()
instead.  See comment above about making destroy() more flexible.  If we keep this I'd rename

@@ -77,6 +79,7 @@ public void testEmptyDirs() throws Exception {
     // MetadataStore started, *does* cause the directory to be marked empty.
     fs.delete(existingFile, false);
     status = fs.innerGetFileStatus(existingDir, true);
+    configuredMs.deleteMetadata(existingDir);
Wonder if this should be in a try/finally, or in an @After codepath

+  @Test
+  public void testConsistentListAfterRename() throws Exception {
+    Path[] mkdirs = {
+      path("d1/f"),
+      path("d1/f" + InconsistentAmazonS3Client.DELAY_KEY_SUBSTRING)
+    };
Nice to have: Some function comments describing the general idea of these tests.

> S3Guard: add delete tracking
> ----------------------------
>                 Key: HADOOP-13760
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13760
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Aaron Fabbri
>            Assignee: Sean Mackrory
>         Attachments: HADOOP-13760-HADOOP-13345.001.patch, HADOOP-13760-HADOOP-13345.002.patch,
HADOOP-13760-HADOOP-13345.003.patch, HADOOP-13760-HADOOP-13345.004.patch, HADOOP-13760-HADOOP-13345.005.patch,
HADOOP-13760-HADOOP-13345.006.patch, HADOOP-13760-HADOOP-13345.007.patch, HADOOP-13760-HADOOP-13345.008.patch,
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add delete
> Current behavior on delete is to remove the metadata from the MetadataStore.  To make
deletes consistent, we need to add a {{isDeleted}} flag to {{PathMetadata}} and check it when
returning results from functions like {{getFileStatus()}} and {{listStatus()}}.  In HADOOP-13651,
I added TODO comments in most of the places these new conditions are needed.  The work does
not look too bad.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message