hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6984) In Hadoop 3, make FileStatus serialize itself via protobuf
Date Wed, 01 Feb 2017 23:03:51 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849075#comment-15849075
] 

Andrew Wang commented on HDFS-6984:
-----------------------------------

I think this is a "user-level file format" and covered by:

https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/Compatibility.html#File_formats__Metadata

{quote}
Non-forward-compatible user-file format changes are restricted to major releases. When user-file
formats change, new releases are expected to read existing formats, but may write data in
formats incompatible with prior releases. Also, the community shall prefer to create a new
format that programs must opt in to instead of making incompatible changes to existing formats.
{quote}

I think that's basically what [~stevel@apache.org] said above.

[~chris.douglas] thanks for bearing with, let's move the serde into a library and get this
committed. 

> In Hadoop 3, make FileStatus serialize itself via protobuf
> ----------------------------------------------------------
>
>                 Key: HDFS-6984
>                 URL: https://issues.apache.org/jira/browse/HDFS-6984
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Colin P. McCabe
>            Assignee: Colin P. McCabe
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-6984.001.patch, HDFS-6984.002.patch, HDFS-6984.003.patch, HDFS-6984.nowritable.patch
>
>
> FileStatus was a Writable in Hadoop 2 and earlier.  Originally, we used this to serialize
it and send it over the wire.  But in Hadoop 2 and later, we have the protobuf {{HdfsFileStatusProto}}
which serves to serialize this information.  The protobuf form is preferable, since it allows
us to add new fields in a backwards-compatible way.  Another issue is that already a lot of
subclasses of FileStatus don't override the Writable methods of the superclass, breaking the
interface contract that read(status.write) should be equal to the original status.
> In Hadoop 3, we should just make FileStatus serialize itself via protobuf so that we
don't have to deal with these issues.  It's probably too late to do this in Hadoop 2, since
user code may be relying on the existing FileStatus serialization there.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message