hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6984) In Hadoop 3, make FileStatus serialize itself via protobuf
Date Thu, 04 May 2017 00:54:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15995986#comment-15995986
] 

Andrew Wang commented on HDFS-6984:
-----------------------------------

Thanks for the rev Chris. Looks like this latest patch has grown some new functionality.

bq. For backwards compatibility with 2.x clients, we can either keep setting the permission
bits server side or backport the flags to branch-2 for the "2.x version compatible with 3.x
clusters" release. Any opinions on this? It's simplest to keep setting the bits server-side,
and deleting that code when we remove FsPermissionExtension.

I'd like to keep setting the bits server-side. There's no guarantee whether 2.9.0 or 3.0.0
will GA first, and if we have the option of avoiding incompatibility, I'd like to take it.

bq. I also had a question about snapshots. I'm not certain whether EC and encryption zones
work with them. The current patch doesn't handle ACLs, but I wanted to get a quick check on
that before digging through the implementation.

Do you have a unit test for this behavior? Bit surprised that this doesn't work. If it's not
a regression, we can handle this in a different JIRA.

bq. The JSON code tried to preserve the acl/crypt/ec bits of the FsPermission on AclStatus.
This seems incorrect, but I can try to find a solution if it is meaningful.

Could you give a pointer to this? These bits are new to the JSON code, so it'd be good to
catch any bugs quickly.

A high-level comment:

I'm worried about making HdfsFileStatus extend FileStatus, particularly the link target. HdfsFileStatus
is used on the NameNode, and there is potentially some mangling from turning the link target
String into a Path. The target is supposed to be an opaque value within the filesystem, left
to be interpreted by the client.

How bad is it to split this change out? Wasn't clear what we gain by having HdfsFileStatus
extend FileStatus.

Some nits:

* HttpsFileSystem: Please do not use a wildcard import
* Double ";" in PBHelperClient:

{code}
import org.apache.hadoop.hdfs.protocol.FsPermissionExtension;;
{code}

* Extra import in SnapshotInfo

> In Hadoop 3, make FileStatus serialize itself via protobuf
> ----------------------------------------------------------
>
>                 Key: HDFS-6984
>                 URL: https://issues.apache.org/jira/browse/HDFS-6984
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Colin P. McCabe
>            Assignee: Colin P. McCabe
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-6984.001.patch, HDFS-6984.002.patch, HDFS-6984.003.patch, HDFS-6984.004.patch,
HDFS-6984.005.patch, HDFS-6984.006.patch, HDFS-6984.007.patch, HDFS-6984.008.patch, HDFS-6984.009.patch,
HDFS-6984.nowritable.patch
>
>
> FileStatus was a Writable in Hadoop 2 and earlier.  Originally, we used this to serialize
it and send it over the wire.  But in Hadoop 2 and later, we have the protobuf {{HdfsFileStatusProto}}
which serves to serialize this information.  The protobuf form is preferable, since it allows
us to add new fields in a backwards-compatible way.  Another issue is that already a lot of
subclasses of FileStatus don't override the Writable methods of the superclass, breaking the
interface contract that read(status.write) should be equal to the original status.
> In Hadoop 3, we should just make FileStatus serialize itself via protobuf so that we
don't have to deal with these issues.  It's probably too late to do this in Hadoop 2, since
user code may be relying on the existing FileStatus serialization there.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message