hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14041) CLI command to prune old metadata
Date Thu, 09 Feb 2017 02:09:41 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858874#comment-15858874
] 

Aaron Fabbri commented on HADOOP-14041:
---------------------------------------

Thank you for the patch.  Overall it looks pretty good.  A couple of things need addressing.
 The core-default and the InterruptedException are the important ones.

{noformat}
+  @InterfaceStability.Unstable
+  public static final String S3GUARD_CLI_PRUNE_AGE =
+      "fs.s3a.s3guard.cli.prune.age";
{noformat}

Probably want a snippet in core-default.xml.

{noformat}
+  @InterfaceStability.Unstable
+  public static final String S3GUARD_DDB_BATCH_SLEEP_MSEC_KEY =
+      "fs.s3a.s3guard.ddb.batch.sleep";
+  public static final int S3GUARD_DDB_BATCH_SLEEP_MSEC_DEFAULT = 25;
{noformat}

Same here.  Also wondering if we should call this "...ddb.prune.batch.sleep" as to not cause
confusion with stuff like HADOOP-13904.  I think prune is going to remain a special case since
it is a background priority job.  We could also call it "...ddb.background.sleep" to future-proof
it for other background tasks (e.g. if we introduced an background scrubber or integrity checker?

{noformat}
+    deletionBatch.add(path);
+          if (deletionBatch.size() == S3GUARD_DDB_BATCH_WRITE_REQUEST_LIMIT) {
+            Thread.sleep(delay);
+            processBatchWriteRequest(pathToKey(deletionBatch), new Item[0]);
+          }
+        } catch (IOException e) {
+          LOG.error(e.getMessage());
+        }
+        if (deletionBatch.size() > 0) {
+          Thread.sleep(delay);
+          processBatchWriteRequest(pathToKey(deletionBatch), new Item[0]);
+        }
{noformat}

Minor nit: I would make sleep happen between batches (not before the first).  e.g. 

{noformat}
long batchCount = 0;
...
deletionBatch.add(path);
if (deletionBatch.size() == S3GUARD_DDB_BATCH_WRITE_REQUEST_LIMIT) {
    if (batchCount++ > 0) {    // don't sleep before first batch
        Thread.sleep(delay); 
    }
    processBatchWriteRequest(pathToKey(deletionBatch), new Item[0]);
...
{noformat}

You could also use that for an interesting log message {{LOG.debug("Finished processing {}
batches", batchCount);}}

{noformat}
+    } catch (InterruptedException e) {
+      LOG.warn("Pruning operation was interrupted!");
+    }
{noformat}
You need to propagate this exception, or set the threads' interrupt status.






> CLI command to prune old metadata
> ---------------------------------
>
>                 Key: HADOOP-14041
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14041
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>         Attachments: HADOOP-14041-HADOOP-13345.001.patch, HADOOP-14041-HADOOP-13345.002.patch,
HADOOP-14041-HADOOP-13345.003.patch
>
>
> Add a CLI command that allows users to specify an age at which to prune metadata that
hasn't been modified for an extended period of time. Since the primary use-case targeted at
the moment is list consistency, it would make sense (especially when authoritative=false)
to prune metadata that is expected to have become consistent a long time ago.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message