hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Mackrory (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-14041) CLI command to prune old metadata
Date Thu, 02 Feb 2017 17:22:51 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Sean Mackrory updated HADOOP-14041:
    Attachment: HADOOP-14041-HADOOP-13345.001.patch

Attaching a patch that adds prune(timestamp) to the MetadataStore interface and existing implementations,
a CLI tool, and tests for all of that. prune() takes a UTC timestamp as returned by System.currentTimeMillis()
and should trim everything with a modification time older than that. The CLI tool determines
the timestamp by taking the current time and subtracting various lengths of time. One tricky
thing is you can specify minutes with -M, and all the time ranges are in caps so that doesn't
clash with -m for specifying the metastore URL.

One thing that probably needs more work is what to do about directories. The local implementation
will delete its record of a directory if all the files it tracks in that directory get pruned.
I should at least do the equivalent for the DynamoDB implementation, but since there's been
some special consideration for handling empty directories that may warrant some more thought.
I know [~fabbri]'s been thinking about the nuances of empty directories - any thoughts on

All tests pass except as currently documented in other JIRAs. I did for a time have a lot
of tests fail at the assertion of type S3AFileStatus in PathMetadataDynamoDBTranslation.pathMetadataToItem.
Indeed, we do have a lot of instances of FileStatus (S3AFileStatus' parent class) flying around
S3Guard, so I'm surprised I don't get it consistently, but today all the tests are passing.
I can't see how anything I've changed while working on this patch would impact it. So just
throwing this out there in case others have seen it or have any insight.

> CLI command to prune old metadata
> ---------------------------------
>                 Key: HADOOP-14041
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14041
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>         Attachments: HADOOP-14041-HADOOP-13345.001.patch
> Add a CLI command that allows users to specify an age at which to prune metadata that
hasn't been modified for an extended period of time. Since the primary use-case targeted at
the moment is list consistency, it would make sense (especially when authoritative=false)
to prune metadata that is expected to have become consistent a long time ago.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message