hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Can we update protobuf's version on trunk?
Date Wed, 29 Mar 2017 20:13:44 GMT
Is the below evidence enough that pb3 in proto2 syntax mode does not drop
'unknown' fields? (Maybe you want evidence that java tooling behaves the
same?)

To be clear, when we say proxy above, are we expecting that a pb message
deserialized by a process down-the-line that happens to have a crimped
proto definition that is absent a couple of fields somehow can re-serialize
and at the end of the line, all fields are present? Or are we talking
pass-through of the message without rewrite?

Thanks,
St.Ack


# Using the protoc v3.0.2 tool
$ protoc --version
libprotoc 3.0.2

# I have a simple proto definition with two fields in it
$ more pb.proto
message Test {
  optional string one = 1;
  optional string two = 2;
}

# This is a text-encoded instance of a 'Test' proto message:
$ more pb.txt
one: "one"
two: "two"

# Now I encode the above as a pb binary
$ protoc --encode=Test pb.proto < pb.txt > pb.bin
[libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
specified for the proto file: pb.proto. Please use 'syntax = "proto2";' or
'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2
syntax.)

# Here is a dump of the binary
$ od -xc pb.bin
0000000      030a    6e6f    1265    7403    6f77
          \n 003   o   n   e 022 003   t   w   o
0000012

# Here is a proto definition file that has a Test Message minus the 'two'
field.
$ more pb_drops_two.proto
message Test {
  optional string one = 1;
}

# Use it to decode the bin file:
$ protoc --decode=Test pb_drops_two.proto < pb.bin
[libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
specified for the proto file: pb_drops_two.proto. Please use 'syntax =
"proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted
to proto2 syntax.)
one: "one"
2: "two"

Note how the second field is preserved (absent a field name). It is not
dropped.

If I change the syntax of pb_drops_two.proto to be proto3, the field IS
dropped.

# Here proto file with proto3 syntax specified (had to drop the 'optional'
qualifier -- not allowed in proto3):
$ more pb_drops_two.proto
syntax = "proto3";
message Test {
  string one = 1;
}

$ protoc --decode=Test pb_drops_two.proto < pb.bin  > pb_drops_two.txt
$ more pb_drops_two.txt
one: "one"


I cannot reencode the text output using pb_drops_two.proto. It complains:

$ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
pb_drops_two.bin
[libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
specified for the proto file: pb_drops_two.proto. Please use 'syntax =
"proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted
to proto2 syntax.)
input:2:1: Expected identifier, got: 2

Proto 2.5 does same:

$ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto <
pb_drops_two.txt > pb_drops_two.bin
input:2:1: Expected identifier.
Failed to parse input.

St.Ack






On Wed, Mar 29, 2017 at 10:14 AM, Stack <stack@duboce.net> wrote:

> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang <andrew.wang@cloudera.com>
> wrote:
>
>> >
>> > > If unknown fields are dropped, then applications proxying tokens and
>> > other
>> > >> data between servers will effectively corrupt those messages, unless
>> we
>> > >> make everything opaque bytes, which- absent the convenient,
>> prenominate
>> > >> semantics managing the conversion- obviate the compatibility
>> machinery
>> > that
>> > >> is the whole point of PB. Google is removing the features that
>> justified
>> > >> choosing PB over its alternatives. Since we can't require that our
>> > >> applications compile (or link) against our updated schema, this
>> creates
>> > a
>> > >> problem that PB was supposed to solve.
>> > >
>> > >
>> > > This is scary, and it potentially affects services outside of the
>> Hadoop
>> > > codebase. This makes it difficult to assess the impact.
>> >
>> > Stack mentioned a compatibility mode that uses the proto2 semantics.
>> > If that carries unknown fields through intermediate handlers, then
>> > this objection goes away. -C
>>
>>
>> Did some more googling, found this:
>>
>> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ
>>
>> Feng Xiao appears to be a Google engineer, and suggests workarounds like
>> packing the fields into a byte type. No mention of a PB2 compatibility
>> mode. Also here:
>>
>> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ
>>
>> Participants say that unknown fields were dropped for automatic JSON
>> encoding, since you can't losslessly convert to JSON without knowing the
>> type.
>>
>> Unfortunately, it sounds like these are intrinsic differences with PB3.
>>
>>
> As I read it Andrew, the field-dropping happens when pb3 is running in
> proto3 'mode'. Let me try it...
>
> St.Ack
>
>
>
>> Best,
>> Andrew
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message