hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12294) Let distcp to bypass external attribute provider when calling getFileStatus etc at source cluster
Date Wed, 16 Aug 2017 18:49:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129256#comment-16129256

Yongjun Zhang commented on HDFS-12294:

Hi [~chris.douglas],

I thought I udpated here, but seems I did not finish.

The requirement is, if a file has attribute X in fsimage, but external provider overrides
it to be Y, we want distcp to copy X instead of Y. 

The external provider can both add and remove attributes, My worry is that filtering by user
name would be hacky and even not work, since the same user can request external attributes
when not running distcp. Even when running distcp, possibly some calls may also need the external
attribute, such as check permission. 

In addition, any other user is supposed to be able to run distcp too.

Do you agree?


> Let distcp to bypass external attribute provider when calling getFileStatus etc at source
> -------------------------------------------------------------------------------------------------
>                 Key: HDFS-12294
>                 URL: https://issues.apache.org/jira/browse/HDFS-12294
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
> This is an alternative solution for HDFS-12202, which proposed introducing a new set
of API, with an additional boolean parameter bypassExtAttrProvider, so to let NN bypass external
attribute provider when getFileStatus. The goal is to avoid distcp from copying attributes
from one cluster's external attribute provider and save to another cluster's fsimage.
> The solution here is, instead of having an additional parameter, encode this parameter
to the path itself, when calling getFileStatus (and some other calls), NN will parse the path,
and figure out that whether external attribute provider need to be bypassed. The suggested
encoding is to have a prefix to the path before calling getFileStatus, e.g. /ab/c becomes
/.reserved/bypassExtAttr/a/b/c. NN will parse the path at the very beginning.
> Thanks much to [~andrew.wang] for this suggestion. The scope of change is smaller and
we don't have to change the FileSystem APIs.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message