hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Can we update protobuf's version on trunk?
Date Thu, 30 Mar 2017 16:55:24 GMT
On Thu, Mar 30, 2017 at 9:16 AM, Chris Douglas <chris.douglas@gmail.com>
wrote:

> On Wed, Mar 29, 2017 at 4:59 PM, Stack <stack@duboce.net> wrote:
> >> The former; an intermediate handler decoding, [modifying,] and
> >> encoding the record without losing unknown fields.
> >>
> >
> > I did not try this. Did you? Otherwise I can.
>
> Yeah, I did. Same format. -C
>
>
Grand.
St.Ack




> >> This looks fine. -C
> >>
> >> > Thanks,
> >> > St.Ack
> >> >
> >> >
> >> > # Using the protoc v3.0.2 tool
> >> > $ protoc --version
> >> > libprotoc 3.0.2
> >> >
> >> > # I have a simple proto definition with two fields in it
> >> > $ more pb.proto
> >> > message Test {
> >> >   optional string one = 1;
> >> >   optional string two = 2;
> >> > }
> >> >
> >> > # This is a text-encoded instance of a 'Test' proto message:
> >> > $ more pb.txt
> >> > one: "one"
> >> > two: "two"
> >> >
> >> > # Now I encode the above as a pb binary
> >> > $ protoc --encode=Test pb.proto < pb.txt > pb.bin
> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
> syntax
> >> > specified for the proto file: pb.proto. Please use 'syntax =
> "proto2";'
> >> > or
> >> > 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2
> >> > syntax.)
> >> >
> >> > # Here is a dump of the binary
> >> > $ od -xc pb.bin
> >> > 0000000      030a    6e6f    1265    7403    6f77
> >> >           \n 003   o   n   e 022 003   t   w   o
> >> > 0000012
> >> >
> >> > # Here is a proto definition file that has a Test Message minus the
> >> > 'two'
> >> > field.
> >> > $ more pb_drops_two.proto
> >> > message Test {
> >> >   optional string one = 1;
> >> > }
> >> >
> >> > # Use it to decode the bin file:
> >> > $ protoc --decode=Test pb_drops_two.proto < pb.bin
> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
> syntax
> >> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> >> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
> >> > (Defaulted
> >> > to proto2 syntax.)
> >> > one: "one"
> >> > 2: "two"
> >> >
> >> > Note how the second field is preserved (absent a field name). It is
> not
> >> > dropped.
> >> >
> >> > If I change the syntax of pb_drops_two.proto to be proto3, the field
> IS
> >> > dropped.
> >> >
> >> > # Here proto file with proto3 syntax specified (had to drop the
> >> > 'optional'
> >> > qualifier -- not allowed in proto3):
> >> > $ more pb_drops_two.proto
> >> > syntax = "proto3";
> >> > message Test {
> >> >   string one = 1;
> >> > }
> >> >
> >> > $ protoc --decode=Test pb_drops_two.proto < pb.bin  > pb_drops_two.txt
> >> > $ more pb_drops_two.txt
> >> > one: "one"
> >> >
> >> >
> >> > I cannot reencode the text output using pb_drops_two.proto. It
> >> > complains:
> >> >
> >> > $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
> >> > pb_drops_two.bin
> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
> syntax
> >> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> >> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
> >> > (Defaulted
> >> > to proto2 syntax.)
> >> > input:2:1: Expected identifier, got: 2
> >> >
> >> > Proto 2.5 does same:
> >> >
> >> > $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto <
> >> > pb_drops_two.txt > pb_drops_two.bin
> >> > input:2:1: Expected identifier.
> >> > Failed to parse input.
> >> >
> >> > St.Ack
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Mar 29, 2017 at 10:14 AM, Stack <stack@duboce.net> wrote:
> >> >>
> >> >> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang <
> andrew.wang@cloudera.com>
> >> >> wrote:
> >> >>>
> >> >>> >
> >> >>> > > If unknown fields are dropped, then applications proxying
tokens
> >> >>> > > and
> >> >>> > other
> >> >>> > >> data between servers will effectively corrupt those
messages,
> >> >>> > >> unless
> >> >>> > >> we
> >> >>> > >> make everything opaque bytes, which- absent the convenient,
> >> >>> > >> prenominate
> >> >>> > >> semantics managing the conversion- obviate the compatibility
> >> >>> > >> machinery
> >> >>> > that
> >> >>> > >> is the whole point of PB. Google is removing the
features that
> >> >>> > >> justified
> >> >>> > >> choosing PB over its alternatives. Since we can't
require that
> >> >>> > >> our
> >> >>> > >> applications compile (or link) against our updated
schema, this
> >> >>> > >> creates
> >> >>> > a
> >> >>> > >> problem that PB was supposed to solve.
> >> >>> > >
> >> >>> > >
> >> >>> > > This is scary, and it potentially affects services outside
of
> the
> >> >>> > > Hadoop
> >> >>> > > codebase. This makes it difficult to assess the impact.
> >> >>> >
> >> >>> > Stack mentioned a compatibility mode that uses the proto2
> semantics.
> >> >>> > If that carries unknown fields through intermediate handlers,
then
> >> >>> > this objection goes away. -C
> >> >>>
> >> >>>
> >> >>> Did some more googling, found this:
> >> >>>
> >> >>> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ
> >> >>>
> >> >>> Feng Xiao appears to be a Google engineer, and suggests workarounds
> >> >>> like
> >> >>> packing the fields into a byte type. No mention of a PB2
> compatibility
> >> >>> mode. Also here:
> >> >>>
> >> >>> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ
> >> >>>
> >> >>> Participants say that unknown fields were dropped for automatic
JSON
> >> >>> encoding, since you can't losslessly convert to JSON without knowing
> >> >>> the
> >> >>> type.
> >> >>>
> >> >>> Unfortunately, it sounds like these are intrinsic differences with
> >> >>> PB3.
> >> >>>
> >> >>
> >> >> As I read it Andrew, the field-dropping happens when pb3 is running
> in
> >> >> proto3 'mode'. Let me try it...
> >> >>
> >> >> St.Ack
> >> >>
> >> >>
> >> >>>
> >> >>> Best,
> >> >>> Andrew
> >> >>
> >> >>
> >> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message