hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Templeton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12374) Description of hdfs expunge command is confusing
Date Mon, 07 Sep 2015 15:55:46 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733865#comment-14733865

Daniel Templeton commented on HADOOP-12374:

> User can go to reference link to understand what is checkpoint and what does this command

The provided link is to the HDFS architecture guide.  In that guide, if you search for "trash",
it tells you that deleting a file will actually move it into the trash.  True, but not helpful.
 If you search for "checkpoint", it tells you about the NN's edit log checkpointing.  Also
true, but not helpful.  The only thing I was able to find that clarified in specific terms
what a trash checkpoint is, is the source code.  Googling around a bit, there are a couple
of forum and blog posts here and there that do explain bits of how the trash works in HDFS,
and taken together you get a pretty clear picture, but that's a handful of hits in a sea of
"it empties the trash."

My point is that this is an excellent opportunity to create a useful source of documentation
on what happens when you use -expunge.  I think these docs would be much more helpful if they
said something like:

* The trash folder is divided into "checkpoints" that contain the files deleted during given
time windows
* Every fs.trash.checkpoint.interval minutes, HDFS will create a new checkpoint, and all files
subsequently deleted will go there
* Every fs.trash.interval minutes, HDFS will delete all checkpoints older than fs.trash.interval
and then create a new checkpoint
* hdfs -expunge will causes HSDS to delete all checkpoints older than fs.trash.interval

I didn't think too hard about the phrasing, but you get my point.  Provide enough information
that a user can understand what a checkpoint is and why they'd want to expunge one without
having to go on a Googlequest or read source code.

> Description of hdfs expunge command is confusing
> ------------------------------------------------
>                 Key: HADOOP-12374
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12374
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: documentation, trash
>    Affects Versions: 2.7.0, 2.7.1
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>            Priority: Trivial
>              Labels: docuentation, newbie, suggestions, trash
>         Attachments: HADOOP-12374.001.patch
> Usage: hadoop fs -expunge
> Empty the Trash. Refer to the HDFS Architecture Guide for more information on the Trash
> this description is confusing. It gives user the impression that this command will empty
trash, but actually it only removes old checkpoints. If user sets a pretty long value for
fs.trash.interval, this command will not remove anything until checkpoints exist longer than
this value.

This message was sent by Atlassian JIRA

View raw message