hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10756) Expose getTrashRoot to HTTPFS and WebHDFS
Date Wed, 17 Aug 2016 04:57:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423880#comment-15423880

Xiao Chen commented on HDFS-10756:

Thanks [~yuanbo] for looking into this.
bq. Actually I find that EZ file can not be moved from EZ to a trash directory under the same
EZ while using hadoop client.
Could you give an example of what you mean by this?
>From CLI, {{hdfs dfs -rm /ez/file}} will rename ({{mv}}) the file to a trash directory
(/ez1/.Trash/$USER/file, which is decided using getTrashRoot), and {{hdfs dfs -rm -skipTrash
/ez/file}} will permanently delete the file.

The issue I'm reporting here is that, for a webhdfs/httpfs client (imagine a Python script
accessing hdfs), since the [rest API|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html]
does not support trash, a file can only be deleted permanently. Without EZ, the client can
workaround this by rename the file into {{/user/username/.Trash}}. But EZ restricts such rename
operation, and it has to be moved to {{/ez1/.Trash/$USER}}, which the client has no way to
know. So my proposal here is to add {{getTrashRoot}} so webhdfs/httpfs clients know where
to rename the file to, without worrying about the file being in a EZ or not.

An alternate way is to maybe add a {{-moveToTrash}} to the {{delete}} api on webhdfs/httpfs,
but I'm not sure whether that's feasible.

> Expose getTrashRoot to HTTPFS and WebHDFS
> -----------------------------------------
>                 Key: HDFS-10756
>                 URL: https://issues.apache.org/jira/browse/HDFS-10756
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: encryption, httpfs, webhdfs
>            Reporter: Xiao Chen
> Currently, hadoop FileSystem API has [getTrashRoot|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L2708]
to determine trash directory at run time. Default trash dir is under {{/user/$USER}}
> For an encrypted file, since moving files between/in/out of EZs are not allowed, when
an EZ file is deleted via CLI, it calls in to [DFS implementation|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L2485]
to move the file to a trash directory under the same EZ.
> This works perfectly fine for CLI users or java users who call FileSystem API. But for
users via httpfs/webhdfs, currently there is no way to figure out what the trash root would
be. This jira is proposing we add such interface to httpfs and webhdfs.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message