hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12202) Provide new set of FileSystem API to bypass external attribute provider
Date Thu, 10 Aug 2017 00:05:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120861#comment-16120861
] 

Chris Douglas commented on HDFS-12202:
--------------------------------------

bq. I wish there is a solution that we can avoid modifying HDFS API
Modifying the FileSystem API to support this single, narrow use case is not a good tradeoff.
Let's find some reasonable workaround(s).

bq. the same user may run different applications too than distcp too
bq. ANY user can run distcp, and distcp can happen within a same cluster too. If we want to
these, would it be too restrictive.
Is it so restrictive? In trunk, the new shell scripts can swap out distcp with another utility
that could delegate the request to a service. Running a dedicated service for a deployment
with this _very_ specific constraint is not so dire.

bq. we have the problem of not knowing when to pass through, because only users knows when
to pass through and we don't have a way to fill the gap between user (accessing FileSystem
API only) and the pass through API of external attribute provider
Is filtering sufficient? Or does this also need to add attributes that the external attribute
provider strips out? If distcp only needs to filter out some extended attributes, then the
client can do this without cooperation from HDFS.

> Provide new set of FileSystem API to bypass external attribute provider
> -----------------------------------------------------------------------
>
>                 Key: HDFS-12202
>                 URL: https://issues.apache.org/jira/browse/HDFS-12202
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: hdfs, hdfs-client
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>
> HDFS client uses 
> {code}
>   /**
>    * Return a file status object that represents the path.
>    * @param f The path we want information from
>    * @return a FileStatus object
>    * @throws FileNotFoundException when the path does not exist
>    * @throws IOException see specific implementation
>    */
>   public abstract FileStatus getFileStatus(Path f) throws IOException;
>   /**
>    * List the statuses of the files/directories in the given path if the path is
>    * a directory.
>    * <p>
>    * Does not guarantee to return the List of files/directories status in a
>    * sorted order.
>    * <p>
>    * Will not return null. Expect IOException upon access error.
>    * @param f given path
>    * @return the statuses of the files/directories in the given patch
>    * @throws FileNotFoundException when the path does not exist
>    * @throws IOException see specific implementation
>    */
>   public abstract FileStatus[] listStatus(Path f) throws FileNotFoundException,
>                                                          IOException;
> {code}
> to get FileStatus of files.
> When external attribute provider (INodeAttributeProvider) is enabled for a cluster, the
 external attribute provider is consulted to get back some relevant info (including ACL, group
etc) and returned back in FileStatus, 
> There is a problem here, when we use distcp to copy files from srcCluster to tgtCluster,
if srcCluster has external attribute provider enabled, the data we copied would contain data
from attribute provider, which we may not want.
> Create this jira to add a new set of interface for distcp to use, so that distcp can
copy HDFS data only and bypass external attribute provider data.
> The new set API would look like
> {code}
>  /**
>    * Return a file status object that represents the path.
>    * @param f The path we want information from
>    * @param bypassExtAttrProvider if true, bypass external attr provider
>    *        when it's in use.
>    * @return a FileStatus object
>    * @throws FileNotFoundException when the path does not exist
>    * @throws IOException see specific implementation
>    */
>   public FileStatus getFileStatus(Path f,
>       final boolean bypassExtAttrProvider) throws IOException;
>   /**
>    * List the statuses of the files/directories in the given path if the path is
>    * a directory.
>    * <p>
>    * Does not guarantee to return the List of files/directories status in a
>    * sorted order.
>    * <p>
>    * Will not return null. Expect IOException upon access error.
>    * @param f
>    * @param bypassExtAttrProvider if true, bypass external attr provider
>    *        when it's in use.
>    * @return
>    * @throws FileNotFoundException
>    * @throws IOException
>    */
>   public FileStatus[] listStatus(Path f,
>       final boolean bypassExtAttrProvider) throws FileNotFoundException,
>                                                   IOException;
> {code}
> So when bypassExtAttrProvider is true, external attribute provider will be bypassed.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message