Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 31C9B200BE5 for ; Fri, 9 Dec 2016 21:24:01 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 2F5FB160B1D; Fri, 9 Dec 2016 20:24:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7DA13160AFD for ; Fri, 9 Dec 2016 21:24:00 +0100 (CET) Received: (qmail 8574 invoked by uid 500); 9 Dec 2016 20:23:59 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 8476 invoked by uid 99); 9 Dec 2016 20:23:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Dec 2016 20:23:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id ECD202C03EA for ; Fri, 9 Dec 2016 20:23:58 +0000 (UTC) Date: Fri, 9 Dec 2016 20:23:58 +0000 (UTC) From: "Andrew Wang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-6984) In Hadoop 3, make FileStatus serialize itself via protobuf MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 09 Dec 2016 20:24:01 -0000 [ https://issues.apache.org/jira/browse/HDFS-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736258#comment-15736258 ] Andrew Wang commented on HDFS-6984: ----------------------------------- Here's also a more radical idea: how about we simply stop implementing Writable altogether in FileStatus? I did an empirical test by successfully building all of CDH with a patch that does this. This includes apps like HBase, Hive, Spark, Solr, Impala, Oozie, Avro, Parquet, etc. I'm guessing FileStatus was originally made Writable as a shortcut for DistCp. Maybe there are custom apps that use Writable, but the vast majority of Cloudera users interact with a Hadoop cluster via a higher-level framework. > In Hadoop 3, make FileStatus serialize itself via protobuf > ---------------------------------------------------------- > > Key: HDFS-6984 > URL: https://issues.apache.org/jira/browse/HDFS-6984 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 3.0.0-alpha1 > Reporter: Colin P. McCabe > Assignee: Colin P. McCabe > Labels: BB2015-05-TBR > Attachments: HDFS-6984.001.patch, HDFS-6984.002.patch, HDFS-6984.003.patch, HDFS-6984.nowritable.patch > > > FileStatus was a Writable in Hadoop 2 and earlier. Originally, we used this to serialize it and send it over the wire. But in Hadoop 2 and later, we have the protobuf {{HdfsFileStatusProto}} which serves to serialize this information. The protobuf form is preferable, since it allows us to add new fields in a backwards-compatible way. Another issue is that already a lot of subclasses of FileStatus don't override the Writable methods of the superclass, breaking the interface contract that read(status.write) should be equal to the original status. > In Hadoop 3, we should just make FileStatus serialize itself via protobuf so that we don't have to deal with these issues. It's probably too late to do this in Hadoop 2, since user code may be relying on the existing FileStatus serialization there. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org