hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2006) ability to support storing extended attributes per file
Date Mon, 21 Apr 2014 18:30:26 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975841#comment-13975841
] 

Chris Nauroth commented on HDFS-2006:
-------------------------------------

Hello, [~hitliuyi].  Thank you for taking on this issue.  I read through the design document
and initial patch, and it looks like a great start.  Here are a few questions.

# Is there an intention to integrate with the ACLs work done in HDFS-4685?  For example, on
Linux, the getfattr command can present POSIX ACLs as xattrs in the system namespace.  I haven't
fully explored use of getfattr/setfattr in combination with ACLs, so I don't know the details
of how this works on Linux yet.  The most important piece of this would be to guarantee that
setfattr cannot circumvent any important security logic enforced by setfacl, and write tests
for it.
# bq. Whether inode has a XAttr is through a bit of int which is shared in inode.
Can you elaborate on this?  This sounded like using a bit somewhere as a flag to indicate
presence of xattrs, but I didn't spot anything like that in my initial scan of the patch.
 I just saw the new inode feature, which was basically implemented as I would have expected.
# bq. In current version, large value will be stored in NN memory, unless client defines its
own reference format, have external storage and can fetch content through reference.
Can you please clarify this part?  Is the intention that resolution of the xattr reference
is somehow pluggable inside the NN, or is this just talking about a general strategy of using
an xattr to persist a foreign key to some external system, with the client then driving lookup
at that external system?
# Could you provide more details on the memory calculation?  For example, what numbers are
assumed for current inode size, average number of xattrs per inode, and average size of name
and value for the individual xattrs?
# I was confused by the {{getXAttrs}} API that accepts {{List<Xattr>}}.  Looking at
the implementation in {{FSNamesystem#getXAttrs}}, it appears that the get operation also may
optionally mutate the xattrs.  Do I understand correctly?  If so, can we remove this and make
the get operation focus solely on returning the existing xattrs?
# Right now, if a caller attempts to set multiple xattrs, then the authorized ones are accepted,
and the unauthorized ones get filtered out and ignored silently.  This could be problematic
for error handling scenarios.  For example, if we imagine a member of the super-group modifying
xattrs in the system namespace successfully, and then that user gets removed from the super-group,
then their calls will start failing to apply the xattrs silently, but no error will be propagated
back to the caller to inform them that something is broken now.  Perhaps this scenario should
throw {{AccessControlException}} instead.
# Do you intend to enforce any conservative upper limits to try to mitigate against someone
maliciously or accidentally bloating the NN memory footprint by writing huge data into xattrs?
 Are there upper limits on number of xattrs on a single inode, number of bytes in an xattr
name and number of bytes in an xattr value?
# The current patch appears to include xattrs in the method signatures for the internal mkdir
methods, but I didn't notice a need for this.  Was this modeled after the code changes for
ACLs?  For POSIX ACLs, we needed to support the notion of a default ACL, which is an ACL copied
from the parent directory to the child sub-directory or file at time of creation.  In the
implementation, this required passing along the list of ACL entries to copy in some of the
internal methods.  I'm not aware of general support for this kind of behavior across all xattrs,
so if it doesn't exist and there is no use case for it, then I recommend removing this part
of the code.
# The design states a desire to preserve xattrs using distcp, but it also states that WebHDFS
is not yet in scope.  WebHDFS is very commonly used as a distcp target though, so I wonder
if we should reconsider.
# I assume that you will not support getting and setting xattrs via NFS mount.  (i.e. We won't
implement the case of someone calling Linux getattr/setattr on a path that is really under
an NFS mount point to an HDFS cluster.)  That's the choice we made for initial development
of getfacl/setfacl.  If this is the plan for xattrs too, then I just recommend stating that
in the design doc.

Thanks, again.  It looks like good work!

At this point, I recommend establishing a HDFS-2006 feature branch, so that you can start
breaking up portions of the patch, and getting them through finer-grained code review and
commit.  We'll need to rally help from a PMC member.  If a PMC member is listening, would
you please create an HDFS-2006 branch?  Thanks!

> ability to support storing extended attributes per file
> -------------------------------------------------------
>
>                 Key: HDFS-2006
>                 URL: https://issues.apache.org/jira/browse/HDFS-2006
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: dhruba borthakur
>            Assignee: Yi Liu
>         Attachments: HDFS-XAttrs-Design-1.pdf, xattrs.1.patch, xattrs.patch
>
>
> It would be nice if HDFS provides a feature to store extended attributes for files, similar
to the one described here: http://en.wikipedia.org/wiki/Extended_file_attributes. 
> The challenge is that it has to be done in such a way that a site not using this feature
does not waste precious memory resources in the namenode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message