hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Wang <andrew.w...@cloudera.com>
Subject Re: Can we update protobuf's version on trunk?
Date Thu, 30 Mar 2017 23:10:45 GMT
Great. If y'all are satisfied, I am too.

My only other request is that we shade PB even for the non-client JARs,
since empirically there are a lot of downstream projects pulling in our
server-side artifacts.

On Thu, Mar 30, 2017 at 9:55 AM, Stack <stack@duboce.net> wrote:

> On Thu, Mar 30, 2017 at 9:16 AM, Chris Douglas <chris.douglas@gmail.com>
> wrote:
>
>> On Wed, Mar 29, 2017 at 4:59 PM, Stack <stack@duboce.net> wrote:
>> >> The former; an intermediate handler decoding, [modifying,] and
>> >> encoding the record without losing unknown fields.
>> >>
>> >
>> > I did not try this. Did you? Otherwise I can.
>>
>> Yeah, I did. Same format. -C
>>
>>
> Grand.
> St.Ack
>
>
>
>
>> >> This looks fine. -C
>> >>
>> >> > Thanks,
>> >> > St.Ack
>> >> >
>> >> >
>> >> > # Using the protoc v3.0.2 tool
>> >> > $ protoc --version
>> >> > libprotoc 3.0.2
>> >> >
>> >> > # I have a simple proto definition with two fields in it
>> >> > $ more pb.proto
>> >> > message Test {
>> >> >   optional string one = 1;
>> >> >   optional string two = 2;
>> >> > }
>> >> >
>> >> > # This is a text-encoded instance of a 'Test' proto message:
>> >> > $ more pb.txt
>> >> > one: "one"
>> >> > two: "two"
>> >> >
>> >> > # Now I encode the above as a pb binary
>> >> > $ protoc --encode=Test pb.proto < pb.txt > pb.bin
>> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
>> syntax
>> >> > specified for the proto file: pb.proto. Please use 'syntax =
>> "proto2";'
>> >> > or
>> >> > 'syntax = "proto3";' to specify a syntax version. (Defaulted to
>> proto2
>> >> > syntax.)
>> >> >
>> >> > # Here is a dump of the binary
>> >> > $ od -xc pb.bin
>> >> > 0000000      030a    6e6f    1265    7403    6f77
>> >> >           \n 003   o   n   e 022 003   t   w   o
>> >> > 0000012
>> >> >
>> >> > # Here is a proto definition file that has a Test Message minus the
>> >> > 'two'
>> >> > field.
>> >> > $ more pb_drops_two.proto
>> >> > message Test {
>> >> >   optional string one = 1;
>> >> > }
>> >> >
>> >> > # Use it to decode the bin file:
>> >> > $ protoc --decode=Test pb_drops_two.proto < pb.bin
>> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
>> syntax
>> >> > specified for the proto file: pb_drops_two.proto. Please use 'syntax
>> =
>> >> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
>> >> > (Defaulted
>> >> > to proto2 syntax.)
>> >> > one: "one"
>> >> > 2: "two"
>> >> >
>> >> > Note how the second field is preserved (absent a field name). It is
>> not
>> >> > dropped.
>> >> >
>> >> > If I change the syntax of pb_drops_two.proto to be proto3, the field
>> IS
>> >> > dropped.
>> >> >
>> >> > # Here proto file with proto3 syntax specified (had to drop the
>> >> > 'optional'
>> >> > qualifier -- not allowed in proto3):
>> >> > $ more pb_drops_two.proto
>> >> > syntax = "proto3";
>> >> > message Test {
>> >> >   string one = 1;
>> >> > }
>> >> >
>> >> > $ protoc --decode=Test pb_drops_two.proto < pb.bin  >
>> pb_drops_two.txt
>> >> > $ more pb_drops_two.txt
>> >> > one: "one"
>> >> >
>> >> >
>> >> > I cannot reencode the text output using pb_drops_two.proto. It
>> >> > complains:
>> >> >
>> >> > $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
>> >> > pb_drops_two.bin
>> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
>> syntax
>> >> > specified for the proto file: pb_drops_two.proto. Please use 'syntax
>> =
>> >> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
>> >> > (Defaulted
>> >> > to proto2 syntax.)
>> >> > input:2:1: Expected identifier, got: 2
>> >> >
>> >> > Proto 2.5 does same:
>> >> >
>> >> > $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto
<
>> >> > pb_drops_two.txt > pb_drops_two.bin
>> >> > input:2:1: Expected identifier.
>> >> > Failed to parse input.
>> >> >
>> >> > St.Ack
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Mar 29, 2017 at 10:14 AM, Stack <stack@duboce.net> wrote:
>> >> >>
>> >> >> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang <
>> andrew.wang@cloudera.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> >
>> >> >>> > > If unknown fields are dropped, then applications
proxying
>> tokens
>> >> >>> > > and
>> >> >>> > other
>> >> >>> > >> data between servers will effectively corrupt
those messages,
>> >> >>> > >> unless
>> >> >>> > >> we
>> >> >>> > >> make everything opaque bytes, which- absent the
convenient,
>> >> >>> > >> prenominate
>> >> >>> > >> semantics managing the conversion- obviate the
compatibility
>> >> >>> > >> machinery
>> >> >>> > that
>> >> >>> > >> is the whole point of PB. Google is removing
the features that
>> >> >>> > >> justified
>> >> >>> > >> choosing PB over its alternatives. Since we can't
require that
>> >> >>> > >> our
>> >> >>> > >> applications compile (or link) against our updated
schema,
>> this
>> >> >>> > >> creates
>> >> >>> > a
>> >> >>> > >> problem that PB was supposed to solve.
>> >> >>> > >
>> >> >>> > >
>> >> >>> > > This is scary, and it potentially affects services
outside of
>> the
>> >> >>> > > Hadoop
>> >> >>> > > codebase. This makes it difficult to assess the impact.
>> >> >>> >
>> >> >>> > Stack mentioned a compatibility mode that uses the proto2
>> semantics.
>> >> >>> > If that carries unknown fields through intermediate handlers,
>> then
>> >> >>> > this objection goes away. -C
>> >> >>>
>> >> >>>
>> >> >>> Did some more googling, found this:
>> >> >>>
>> >> >>> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ
>> >> >>>
>> >> >>> Feng Xiao appears to be a Google engineer, and suggests workarounds
>> >> >>> like
>> >> >>> packing the fields into a byte type. No mention of a PB2
>> compatibility
>> >> >>> mode. Also here:
>> >> >>>
>> >> >>> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ
>> >> >>>
>> >> >>> Participants say that unknown fields were dropped for automatic
>> JSON
>> >> >>> encoding, since you can't losslessly convert to JSON without
>> knowing
>> >> >>> the
>> >> >>> type.
>> >> >>>
>> >> >>> Unfortunately, it sounds like these are intrinsic differences
with
>> >> >>> PB3.
>> >> >>>
>> >> >>
>> >> >> As I read it Andrew, the field-dropping happens when pb3 is running
>> in
>> >> >> proto3 'mode'. Let me try it...
>> >> >>
>> >> >> St.Ack
>> >> >>
>> >> >>
>> >> >>>
>> >> >>> Best,
>> >> >>> Andrew
>> >> >>
>> >> >>
>> >> >
>> >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message