hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-6984) In Hadoop 3, make FileStatus no longer a Writable
Date Sat, 16 Jan 2016 00:22:40 GMT

     [ https://issues.apache.org/jira/browse/HDFS-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Colin Patrick McCabe updated HDFS-6984:
    Attachment: HDFS-6984.002.patch

I guess making it no longer Writable is probably too big of a change.  DistCp and other programs
make use of the fact that they can write out and later read back FileStatus objects.  However,
it is really unpleasant that we can't add new fields to the serialized representation of FileStatus.

Here's a new version that fixes this dilemma by changing the serialization format to be Protobuf
for FileStatus objects.  This will let us add new fields to FileStatus in the future.  I think
this change makes sense for Hadoop 3 rather than Hadoop 2, since it is incompatible with the
previous FileStatus serialization.

> In Hadoop 3, make FileStatus no longer a Writable
> -------------------------------------------------
>                 Key: HDFS-6984
>                 URL: https://issues.apache.org/jira/browse/HDFS-6984
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-6984.001.patch, HDFS-6984.002.patch
> FileStatus was a Writable in Hadoop 2 and earlier.  Originally, we used this to serialize
it and send it over the wire.  But in Hadoop 2 and later, we have the protobuf {{HdfsFileStatusProto}}
which serves to serialize this information.  The protobuf form is preferable, since it allows
us to add new fields in a backwards-compatible way.  Another issue is that already a lot of
subclasses of FileStatus don't override the Writable methods of the superclass, breaking the
interface contract that read(status.write) should be equal to the original status.
> In Hadoop 3, we should just make FileStatus no longer a writable so that we don't have
to deal with these issues.  It's probably too late to do this in Hadoop 2, since user code
may be relying on the ability to use the Writable methods on FileStatus objects there.

This message was sent by Atlassian JIRA

View raw message