hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-6984) In Hadoop 3, make FileStatus serialize itself via protobuf
Date Wed, 14 Dec 2016 02:27:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746989#comment-15746989
] 

Chris Douglas edited comment on HDFS-6984 at 12/14/16 2:27 AM:
---------------------------------------------------------------

bq. This means we need to be able to add new fields to FileStatus, and that PathHandle should
be an opaque PB-encoded blob. It doesn't require cross-serialization between FileStatus and
HdfsFileStatus, and we don't necessarily need FileStatus to be serializable either.
Yes, I agree. To add new fields to {{FileStatus}}, we were changing its serialization in 3.x
to PB (since updating {{Writable}} is problematic). If we're going to use PB, then arranging
its fields so it _could_ be compatible with {{HdfsFileStatus}} was just anticipating future
changes; you're right, cross-serialization is not a requirement. By arranging the type this
way, the {{bytes}} field is accessible deserialized into a vanilla {{FileStatus}}, but if
one were to deserialize into {{HdfsFileStatus}} then the {{PathHandle}} could have typed,
versioned fields without extra steps. It's free to make this easy, so the patch doesn't make
it hard.

The current proposal in HDFS-7878 offers {{open(FileStatus)}} instead of {{open(PathHandle)}}
to avoid adding new functions to {{FileSystem}} for listing, deleting, renaming, etc. against
a {{PathHandle}}. It'd be convenient if, as is fairly common, processes could just throw the
result of a {{FileSystem}} query to a service/framework using some of the automatic serialization
(e.g., {{Serializable}} in Spark and {{Writable}} in MapReduce), but we could move it to a
library. Again, I'm pretty ambivalent about removing {{Writable}} from {{FileStatus}}, since
we still have it on other data returned by {{FileSystem}} (like {{Path}}), but it is largely
vestigial in 3.x.

bq. Is HDFS-9806 using the FileSystem interface to access the backing storage system? It sounds
like we need to be able to add a new field to FileStatus, but not necessarily cross-serialization
(or serialization) of a FileStatus.

Yes, that's correct. We just need a consistent place to add "metadata that uniquely identifies
a file" so we can handle it consistently across {{FileSystem}} implementations.

bq. Regarding cross-serialization, yes we can save some copies by making HdfsFileStatus implement
FileStatus and returning an HdfsFileStatus directly in FileSystem, but HdfsFileStatus has
even more fields that a user is unlikely to care about. It also seems weird from an API perspective
to include extra, inaccessible data in the serialized version of a class.

If the record doesn't leave the process or the deserializer knew to use a {{HdfsFileStatus}}
then it'd be accessible. Unlike today, a client could use the data we return from the NN if
we didn't throw it out.

I hear you on the inefficiency of carrying excess data per record, but the NN is a more critical
bottleneck, so perhaps we should focus on not returning the unused data at all. Among clients,
unless they're caching thousands of {{FileStatus}} objects in memory, the overhead is negligible,
and the solution is to build efficient composites (i.e., instead of shaving redundant bytes
per record, we can compress repeated data across millions of records).


was (Author: chris.douglas):
bq. This means we need to be able to add new fields to FileStatus, and that PathHandle should
be an opaque PB-encoded blob. It doesn't require cross-serialization between FileStatus and
HdfsFileStatus, and we don't necessarily need FileStatus to be serializable either.
Yes, I agree. To add new fields to {{FileStatus}}, we were changing its serialization in 3.x
to PB (since updating {{Writable}} is problematic). If we're going to use PB, then arranging
its fields so it _could_ be compatible with {{HdfsFileStatus}} was just anticipating future
changes; you're right, cross-serialization is not a requirement. By arranging the type this
way, the {{bytes}} field is accessible deserialized into a vanilla {{FileStatus}}, but if
one were to deserialize into {{HdfsFileStatus}} then the {{PathHandle}} could have typed,
versioned fields without extra steps. It's free to make this easy, so the patch doesn't make
it hard.

The current proposal in HDFS-7878 offers {{open(FileStatus)}} instead of {{open(PathHandle)}}
to avoid adding new functions to {{FileSystem}} for listing, deleting, renaming, etc. against
a {{PathHandle}}. It'd be convenient if, as is fairly common, processes could just throw the
result of a {{FileSystem}} query to a service/framework using some of the automatic serialization
(e.g., {{Serializable}} in Spark and {{Writable}} in MapReduce), but we could move it to a
library. Again, I'm pretty ambivalent about removing {{Writable}} from {{FileStatus}}, since
we still have it on other data returned by {{FileSystem}} (like {{Path}}), but it is largely
vestigial in 3.x.

bq, Is HDFS-9806 using the FileSystem interface to access the backing storage system? It sounds
like we need to be able to add a new field to FileStatus, but not necessarily cross-serialization
(or serialization) of a FileStatus.

Yes, that's correct. We just need a consistent place to add "metadata that uniquely identifies
a file" so we can handle it consistently across {{FileSystem}} implementations.

bq. Regarding cross-serialization, yes we can save some copies by making HdfsFileStatus implement
FileStatus and returning an HdfsFileStatus directly in FileSystem, but HdfsFileStatus has
even more fields that a user is unlikely to care about. It also seems weird from an API perspective
to include extra, inaccessible data in the serialized version of a class.

If the record doesn't leave the process or the deserializer knew to use a {{HdfsFileStatus}}
then it'd be accessible. Unlike today, a client could use the data we return from the NN if
we didn't throw it out.

I hear you on the inefficiency of carrying excess data per record, but the NN is a more critical
bottleneck, so perhaps we should focus on not returning the unused data at all. Among clients,
unless they're caching thousands of {{FileStatus}} objects in memory, the overhead is negligible,
and the solution is to build efficient composites (i.e., instead of shaving redundant bytes
per record, we can compress repeated data across millions of records).

> In Hadoop 3, make FileStatus serialize itself via protobuf
> ----------------------------------------------------------
>
>                 Key: HDFS-6984
>                 URL: https://issues.apache.org/jira/browse/HDFS-6984
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Colin P. McCabe
>            Assignee: Colin P. McCabe
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-6984.001.patch, HDFS-6984.002.patch, HDFS-6984.003.patch, HDFS-6984.nowritable.patch
>
>
> FileStatus was a Writable in Hadoop 2 and earlier.  Originally, we used this to serialize
it and send it over the wire.  But in Hadoop 2 and later, we have the protobuf {{HdfsFileStatusProto}}
which serves to serialize this information.  The protobuf form is preferable, since it allows
us to add new fields in a backwards-compatible way.  Another issue is that already a lot of
subclasses of FileStatus don't override the Writable methods of the superclass, breaking the
interface contract that read(status.write) should be equal to the original status.
> In Hadoop 3, we should just make FileStatus serialize itself via protobuf so that we
don't have to deal with these issues.  It's probably too late to do this in Hadoop 2, since
user code may be relying on the existing FileStatus serialization there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message