Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B9C11200BD3 for ; Tue, 6 Dec 2016 09:21:03 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id B8699160B1B; Tue, 6 Dec 2016 08:21:03 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0FD4D160B0C for ; Tue, 6 Dec 2016 09:21:02 +0100 (CET) Received: (qmail 9270 invoked by uid 500); 6 Dec 2016 08:20:58 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 8908 invoked by uid 99); 6 Dec 2016 08:20:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Dec 2016 08:20:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8B8312C03E0 for ; Tue, 6 Dec 2016 08:20:58 +0000 (UTC) Date: Tue, 6 Dec 2016 08:20:58 +0000 (UTC) From: "Andrew Wang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-6984) In Hadoop 3, make FileStatus serialize itself via protobuf MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 06 Dec 2016 08:21:03 -0000 [ https://issues.apache.org/jira/browse/HDFS-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15724789#comment-15724789 ] Andrew Wang commented on HDFS-6984: ----------------------------------- Thanks for revving this Chris. I don't have a ton of background here, but my review: * Can we split the Serializable stuff into a separate change? Changing booleans into Boolean objects is additional overhead, and combining these two changes makes it harder to review. Also, I am not a Java serialization expert, but IIUC maintaining compatibility is difficult, which means maybe people should just use PB anyway. * The serialization loses the extra {{isEncrypted}} and {{FsPermission#getAclBit}} bits since it calls {{toShort}} rather than {{toExtendedShort}}. Seems like we should save these, though the fact we pack these bits into FsPermission is an internal implementation detail. Adding new booleans to FileStatus might break cross-serialization with HdfsFileStatus though. What is the usecase for cross-serialization? * Could use some more extensive unit tests that test the default values and error conditions with the shorts. * Test nit: please also avoid the wildcard import. Regarding Steve's comment on bounds checking, PB by default has a 64MB max message size. We could use that as a reasonable upper-bound on the size of a FileStatus. > In Hadoop 3, make FileStatus serialize itself via protobuf > ---------------------------------------------------------- > > Key: HDFS-6984 > URL: https://issues.apache.org/jira/browse/HDFS-6984 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 3.0.0-alpha1 > Reporter: Colin P. McCabe > Assignee: Colin P. McCabe > Labels: BB2015-05-TBR > Attachments: HDFS-6984.001.patch, HDFS-6984.002.patch, HDFS-6984.003.patch > > > FileStatus was a Writable in Hadoop 2 and earlier. Originally, we used this to serialize it and send it over the wire. But in Hadoop 2 and later, we have the protobuf {{HdfsFileStatusProto}} which serves to serialize this information. The protobuf form is preferable, since it allows us to add new fields in a backwards-compatible way. Another issue is that already a lot of subclasses of FileStatus don't override the Writable methods of the superclass, breaking the interface contract that read(status.write) should be equal to the original status. > In Hadoop 3, we should just make FileStatus serialize itself via protobuf so that we don't have to deal with these issues. It's probably too late to do this in Hadoop 2, since user code may be relying on the existing FileStatus serialization there. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org