hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaoyu Yao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8747) Provide Better "Scratch Space" and "Soft Delete" Support for HDFS Encryption Zones
Date Fri, 31 Jul 2015 17:09:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649486#comment-14649486

Xiaoyu Yao commented on HDFS-8747:

Thanks [~andrew.wang] for reviewing. 

bq. Have you thought about simply allowing rename between EZs with the same settings? This
would be a much smaller and easier change with similar properties. Your proposal I think is
still better in terms of ease-of-use and also ensuring security invariants around key rolling
(if/when we implement that).

Yes. We've discussed this simpler work around. But there are many limitations such as security
invariants you mentioned above. We don't want to limit different EZs to share the same zone
key just to support rename as they may have different policies. Encryption zone as a security
concept should be managed consistently with a single entity. Based on that, support adding
additional roots to encryption zone is a natural enhancement and better solution.

bq. If we keep the APIs superuser-only, how does a normal user add their trash folder to an
EZ? Same for scratch folders, e.g. if the Hive user is not a superuser.

I think we should keep this API as superuser-only. It can still be useful even though we keep
it as superuser only. The trash folder/scratch folder can be per-created and added to encryption
zone by super user as needed. This removes the limitation for hive scratch folder, which currently
has to be configured under the single root of the encryption zone. We can discuss more on
this for HDFS-8831.

> Provide Better "Scratch Space" and "Soft Delete" Support for HDFS Encryption Zones
> ----------------------------------------------------------------------------------
>                 Key: HDFS-8747
>                 URL: https://issues.apache.org/jira/browse/HDFS-8747
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: encryption
>    Affects Versions: 2.6.0
>            Reporter: Xiaoyu Yao
>            Assignee: Xiaoyu Yao
>         Attachments: HDFS-8747-07092015.pdf, HDFS-8747-07152015.pdf, HDFS-8747-07292015.pdf
> HDFS Transparent Data Encryption At-Rest was introduced in Hadoop 2.6 to allow create
encryption zone on top of a single HDFS directory. Files under the root directory of the encryption
zone will be encrypted/decrypted transparently upon HDFS client write or read operations.

> Generally, it does not support rename(without data copying) across encryption zones or
between encryption zone and non-encryption zone because different security settings of encryption
zones. However, there are certain use cases where efficient rename support is desired. This
JIRA is to propose better support of two such use cases “Scratch Space” (a.k.a. staging
area) and “Soft Delete” (a.k.a. trash) with HDFS encryption zones.
> “Scratch Space” is widely used in Hadoop jobs, which requires efficient rename support.
Temporary files from MR jobs are usually stored in staging area outside encryption zone such
as “/tmp” directory and then rename to targeted directories as specified once the data
is ready to be further processed. 
> Below is a summary of supported/unsupported cases from latest Hadoop:
> * Rename within the encryption zone is supported
> * Rename the entire encryption zone by moving the root directory of the zone  is allowed.
> * Rename sub-directory/file from encryption zone to non-encryption zone is not allowed.
> * Rename sub-directory/file from encryption zone A to encryption zone B is not allowed.
> * Rename from non-encryption zone to encryption zone is not allowed.
> “Soft delete” (a.k.a. trash) is a client-side “soft delete” feature that helps
prevent accidental deletion of files and directories. If trash is enabled and a file or directory
is deleted using the Hadoop shell, the file is moved to the .Trash directory of the user's
home directory instead of being deleted.  Deleted files are initially moved (renamed) to the
Current sub-directory of the .Trash directory with original path being preserved. Files and
directories in the trash can be restored simply by moving them to a location outside the .Trash
> Due to the limited rename support, delete sub-directory/file within encryption zone with
trash feature is not allowed. Client has to use -skipTrash option to work around this. HADOOP-10902
and HDFS-6767 improved the error message but without a complete solution to the problem. 
> We propose to solve the problem by generalizing the mapping between encryption zone and
its underlying HDFS directories from 1:1 today to 1:N. The encryption zone should allow non-overlapped
directories such as scratch space or soft delete "trash" locations to be added/removed dynamically
after creation. This way, rename for "scratch space" and "soft delete" can be better supported
without breaking the assumption that rename is only supported "within the zone". 

This message was sent by Atlassian JIRA

View raw message