Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3E890200C44 for ; Mon, 27 Mar 2017 14:16:26 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3D2A5160B85; Mon, 27 Mar 2017 12:16:26 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 82CD5160B5D for ; Mon, 27 Mar 2017 14:16:25 +0200 (CEST) Received: (qmail 92506 invoked by uid 500); 27 Mar 2017 12:16:23 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 92462 invoked by uid 99); 27 Mar 2017 12:16:22 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Mar 2017 12:16:22 +0000 Received: from mail-ot0-f169.google.com (mail-ot0-f169.google.com [74.125.82.169]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 8D2B01A00A8; Mon, 27 Mar 2017 12:16:22 +0000 (UTC) Received: by mail-ot0-f169.google.com with SMTP id y88so28873213ota.2; Mon, 27 Mar 2017 05:16:22 -0700 (PDT) X-Gm-Message-State: AFeK/H1y+a+VTj03P5w+hf3VlpOm52tgDjxaFJVy9hutfNIwHl5aLmizPeSZJEkRNp6KO48gyPCCdKZ3Bji0sw== X-Received: by 10.157.54.204 with SMTP id s12mr2688421otd.240.1490616981761; Mon, 27 Mar 2017 05:16:21 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.236.198 with HTTP; Mon, 27 Mar 2017 05:16:21 -0700 (PDT) From: Tsuyoshi Ozawa Date: Mon, 27 Mar 2017 21:16:21 +0900 X-Gmail-Original-Message-ID: Message-ID: Subject: Can we update protobuf's version on trunk? To: "common-dev@hadoop.apache.org" , "yarn-dev@hadoop.apache.org" , "hdfs-dev@hadoop.apache.org" , "mapreduce-dev@hadoop.apache.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable archived-at: Mon, 27 Mar 2017 12:16:26 -0000 Dear Hadoop developers, After shaded client, introduced by HADOOP-11804, is merged, we can more easily update some dependency with minimizing the impact of backward compatibility on trunk. (Thanks Sean and Sanjin for taking the issue!) Then, is it time to update protobuf's version to the latest one on trunk? Could you share your opinion here? There has been plural discussions in parallel so far. Hence, I would like to share current opinions by developers with my understanding here. Stack mentioned on HADOOP-13363: * Would this be a problem? Old clients can talk to the new servers because of wire compatible. Is anyone consuming hadoop protos directly other than hadoop? Are hadoop proto files considered InterfaceAudience.Private or InterfaceAudience.Public? If the former, I could work on a patch for 3.0.0 (It'd be big but boring). Does Hadoop have Protobuf in its API anywhere (I can take a look but being lazy asking here first). gohadoop[1] uses proto files directly, treating the proto files as a stable interface. [1] https://github.com/hortonworks/gohadoop/search?utf8=3D%E2%9C%93&q=3D*pr= oto&type=3D Fortunately, in fact, no additional work is needed to compile hadoop code base. Only one work I did is to change getOndiskTrunkSize's argument to take protobuf v3's object[2]. Please point me if I'm something missing. [2] https://issues.apache.org/jira/secure/attachment/12860647/HADOOP-13363.= 004.patch There are some concerns against updating protobuf on HDFS-11010: * I'm really hesitant to bump PB considering the pain it brought last time. (by Andrew) This is because there are no *binary* compatibility, not wire compatibility. If I understand correctly, at the last time, the problem is caused by mixing v2.4.0 and v.2.5.0 class are mixed between Hadoop and HBase. (I knew this fact on Steve's comment on HADOOP-13363[3]) As I firstly mentioned, the protobuf is shaded now on trunk. We don't need to care binary(source code level) compatibility. [3] https://issues.apache.org/jira/browse/HADOOP-13363?focusedCommentId=3D1= 5372724&page=3Dcom.atlassian.jira.plugin.system.issuetabpanels:comment-tabp= anel#comment-15372724 * Have we checked if it's wire compatible with our current version of PB? (by Andrew) As far as I know, it's wire compatible between protobuf v2 and protobuf v3. Google team has been testing it. Of course we can validate it by using a following script. https://chromium.googlesource.com/external/github.com/google/protobuf/+/mas= ter/java/compatibility_tests/README.md * Let me ask the question in a different way, what about PB 3 is concerning to you ?(by Anu) * Some of its incompatibilities with 2.x, such as dropping unknown fields from records. Any component that proxies records must have an updated version of the schema, or it will silently drop data and convert unknown values to defaults. Unknown enum value handling has changed. There's no mention of the convenient "Embedded messages are compatible with bytes if the bytes contain an encoded version of the message" semantics in proto3. (by Chris) This is what we need to discuss. Quoting a document from google's developer's manual, https://developers.google.com/protocol-buffers/docs/proto3#unknowns > For most Google protocol buffers implementations, unknown fields are not = accessible in proto3 via the corresponding proto runtimes, and are dropped = and forgotten at deserialization time. This is different behaviour to proto= 2, where unknown fields are always preserved and serialized along with the = message. Is this incompatibility acceptable, or not acceptable for us? If we need to check some test cases before updating protobuf, it's nice to clarify the test cases we need to check here and test it now. Best regards, - Tsuyoshi --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org