hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6984) In Hadoop 3, make FileStatus serialize itself via protobuf
Date Wed, 15 Mar 2017 21:32:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927021#comment-15927021

Andrew Wang commented on HDFS-6984:

Hi Chris, thanks for revving, a few review comments:

I was wondering if you saw my comment way back about the ACL bit /  encrypted bit / etc:

bq. The takeaways for me are that we should make a separate bitfield for these flags. If we
want to preserve cross-serialization, we'd need to also add this field to HdfsFileStatus,
and we'd always have to be careful with field numbers.

Right now, it looks like if you pass in an HdfsFileStatus with these bits set, they're dropped.
It'd be good to unit test these getters. If you can think up a unit test to detect the addition
of new bits (e.g. isErasureCoded), that'd also be great.

Since a lot of fields are optional in the PB, should we also test with these optional fields
unset? I'm wondering if the resulting FileStatus is filled in with reasonable defaults.

Generally beefing up test coverage would be good too, since it seems like we lost some of
the basic "try writing and reading some different statuses" test from TestFileStatus.

> In Hadoop 3, make FileStatus serialize itself via protobuf
> ----------------------------------------------------------
>                 Key: HDFS-6984
>                 URL: https://issues.apache.org/jira/browse/HDFS-6984
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Colin P. McCabe
>            Assignee: Colin P. McCabe
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-6984.001.patch, HDFS-6984.002.patch, HDFS-6984.003.patch, HDFS-6984.004.patch,
HDFS-6984.005.patch, HDFS-6984.nowritable.patch
> FileStatus was a Writable in Hadoop 2 and earlier.  Originally, we used this to serialize
it and send it over the wire.  But in Hadoop 2 and later, we have the protobuf {{HdfsFileStatusProto}}
which serves to serialize this information.  The protobuf form is preferable, since it allows
us to add new fields in a backwards-compatible way.  Another issue is that already a lot of
subclasses of FileStatus don't override the Writable methods of the superclass, breaking the
interface contract that read(status.write) should be equal to the original status.
> In Hadoop 3, we should just make FileStatus serialize itself via protobuf so that we
don't have to deal with these issues.  It's probably too late to do this in Hadoop 2, since
user code may be relying on the existing FileStatus serialization there.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message