hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14154) Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)
Date Fri, 03 Aug 2018 04:58:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567775#comment-16567775

Aaron Fabbri commented on HADOOP-14154:

Thanks for the v2 patch.  The DDBPathMetadata stuff turned out pretty good.  Was thinking
we could probably reduce garbage a bit with some additional constructors for that class, but
actually I think JVM escape analysis will handle most of those extra allocations on the stack. 
The lists of metadata are harder to deal with but I think your code looks fine here.

Some inline comments...
@@ -840,21 +861,36 @@ public void prune(long modTime, String keyPrefix) throws IOException
           new ArrayList<>(S3GUARD_DDB_BATCH_WRITE_REQUEST_LIMIT);
       int delay = conf.getInt(S3GUARD_DDB_BACKGROUND_SLEEP_MSEC_KEY,
+      Set<Path> parentPathSet =  new HashSet<>();
       for (Item item : expiredFiles(modTime, keyPrefix)) {
-        PathMetadata md = PathMetadataDynamoDBTranslation
+        DDBPathMetadata md = PathMetadataDynamoDBTranslation
             .itemToPathMetadata(item, username);
         Path path = md.getFileStatus().getPath();
+        // add parent path of what we remove
+        Path parentPath = path.getParent();
+        parentPathSet.add(parentPath);
What if parentPath is root dir? I think we want a null check here.
+  private void removeAuthoritativeDirFlag(Set<Path> pathSet) {
+    Set<DDBPathMetadata> metas = pathSet.stream().map(path -> {
+      try {
+        DDBPathMetadata ddbPathMetadata = get(path);
+        if(ddbPathMetadata == null) {
+          return null;
+        }
+        LOG.debug("Setting false isAuthoritativeDir on {}", ddbPathMetadata);
+        ddbPathMetadata.setAuthoritativeDir(false);
+        return ddbPathMetadata;
+      } catch (IOException e) {
+        String msg = String.format("IOException while getting PathMetadata "
+            + "on path: %s.", path);
+        LOG.error(msg, e);
+        return null;
+      }
+    }).filter(Objects::nonNull).collect(Collectors.toSet());
I like that the stream keeps running if one of the paths fail, but should we also save a reference
to the IOException and then throw it at the end of the function? That way we keep working
even if we get a failure, but the failure still gets propagated to the caller.
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStoreTestBase.java
+++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStoreTestBase.java
@@ -727,6 +727,13 @@ public void testPruneUnsetsAuthoritative() throws Exception {
         new FileStatus(0, false, 0, 0, time + 1, strToPath(freshFile)),
         Tristate.FALSE, false));
+    // set parent dir as authoritative
+    if (!allowMissing()) {
+      DirListingMetadata parentDirMd = ms.listChildren(strToPath(parentDir));
+      parentDirMd.setAuthoritative(true);
+      ms.put(parentDirMd);
+    }
Looks like you found a bug in the existing test case? Nice work.

I was wondering if we want an integration test to confirm forward and backward compatibility
with the schema change (old S3a works with new schema rows and vice versa). Not sure how we'd
implement that though. Would probably need a separate copy of a couple of the PathMetadataDynamoDBTranslation
functions with the old logic in them (or better yet, a boolean flag to select old behavior
w/o the read/write of the auth flag), and then use those in the DDB integration test to confirm
it all works. I'm not sure what this buys us in terms of regression testing though--so I could
see the argument for manual testing. What do you think?

> Persist isAuthoritative bit in DynamoDBMetaStore (authoritative mode support)
> -----------------------------------------------------------------------------
>                 Key: HADOOP-14154
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14154
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1
>            Reporter: Rajesh Balamohan
>            Assignee: Gabor Bota
>            Priority: Minor
>         Attachments: HADOOP-14154-HADOOP-13345.001.patch, HADOOP-14154-HADOOP-13345.002.patch,
HADOOP-14154-spec-001.pdf, HADOOP-14154-spec-002.pdf, HADOOP-14154.001.patch, HADOOP-14154.002.patch
> Add support for "authoritative mode" for DynamoDBMetadataStore.
> The missing feature is to persist the bit set in {{DirListingMetadata.isAuthoritative}}. 
> This topic has been super confusing for folks so I will also file a documentation Jira
to explain the design better.
> We may want to also rename the DirListingMetadata.isAuthoritative field to .isFullListing
to eliminate the multiple uses and meanings of the word "authoritative".

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message