hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14041) CLI command to prune old metadata
Date Sat, 18 Feb 2017 01:17:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872846#comment-15872846

Aaron Fabbri commented on HADOOP-14041:

Thanks for the follow-up patch [~mackrorysd].  Looks good.. Of the comments below, I think
the important ones are the prune() method prototype, and errors going to stderr.

+  public void testPruneDirs() throws Exception {
+    // This test does not necessarily define required behavior: directories
+    // that become empty after a prune operation could be cleaned up, but
+    // currently they don't because if a file was created in that directory
+    // mid-prune, it would violate the invariant that all ancestors of a file

Tiny nit: this invariant is an implementation detail of the dynamo MS.  Not a MetadataStore
invariant per se.  Could mention the word dynamo here.

+    // exist in the metastore. If an implementation could satisfy this, it
+    // would be okay for this test not to pass.
+    Assume.assumeFalse(ms instanceof NullMetadataStore);
+    createNewDirs("/pruneDirs/dir");

Did you mean to change this Assume to call {{supportsPruning()}}?
Technically, seems like you should use that, and maybe {{allowMissing()}}?  Basically, when
allowMissing() returns true, the metadata store may not return results you just put into it
(like a cache where something got evicted before you asked for it again).

--- a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java
+++ b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java
@@ -165,4 +165,15 @@ void move(Collection<Path> pathsToDelete, Collection<PathMetadata>
    * @throws IOException if there is an error
   void destroy() throws IOException;
+  /**
+   * Clear any metadata older than a specified time from the repository. Note
+   * that modification times should be in UTC, as returned by System
+   * .currentTimeMillis at the time of modification.
+   *
+   * @param modTime Oldest modification time to allow
+   * @throws IOException if there is an error
+   * @throws InterruptedException if the process is interrupted
+   */
+  void prune(long modTime) throws InterruptedException, IOException;
Couple of things:
1. We should mention here that implementations:  *must* clear any file metadata older than
modTime, *may* clear any directory metadata older than modTime, and throw an UnsupportedOperationException(*)
2. Instead of declaring a checked exception (InterruptedException), IMO, that should always
be wrapped in an IOException.. So this should only be throws IOException.

(*) [~stevel@apache.org] is this the idiomatic thing to do here in Hadoop?

--- a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/NullMetadataStore.java
+++ b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/NullMetadataStore.java
@@ -87,6 +87,10 @@ public void destroy() throws IOException {
+  public void prune(long modTime) throws IOException {
+  }
Love the algorithm here.   Classic no-op, my fave.

+      if (confDelta <= 0 && cliDelta <= 0) {
+        System.out.println(
+            "You must specify a positive age for metadata to prune.");
+      }
I think this should go to stderr (search for "stderr" [here|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]).

--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestNullMetadataStore.java
+++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestNullMetadataStore.java
@@ -51,6 +51,12 @@ public boolean allowMissing() {
     return true;
+  /** This MetadataStore won't store anything, so there's nothing to prune. */
+  @Override
+  public boolean supportsPruning() {
+    return false;
+  }

This part of the change could be left out, I think?  NullMetadataStore always prunes!  Where
prune is defined as removing anything older than X.. always true for empty set.  :-)

> CLI command to prune old metadata
> ---------------------------------
>                 Key: HADOOP-14041
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14041
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>         Attachments: HADOOP-14041-HADOOP-13345.001.patch, HADOOP-14041-HADOOP-13345.002.patch,
HADOOP-14041-HADOOP-13345.003.patch, HADOOP-14041-HADOOP-13345.004.patch, HADOOP-14041-HADOOP-13345.005.patch,
> Add a CLI command that allows users to specify an age at which to prune metadata that
hasn't been modified for an extended period of time. Since the primary use-case targeted at
the moment is list consistency, it would make sense (especially when authoritative=false)
to prune metadata that is expected to have become consistent a long time ago.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message