hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <andrew.purt...@gmail.com>
Subject Re: DISCUSS: Protobufs?
Date Tue, 02 Feb 2016 20:36:04 GMT
I remember talking over capn proto with Stack and a few others a while ago, when it first got
underway. Over the intervening time I think we have seen that Java support is a second or
third order concern for them at best. 

I think the most important factor is getting away from copying. Whatever we choose should
at least allow if not promote buffer/object reuse. Even better the more it fits in with what
we have to work with in our offheaping work and HDFS level APIs. I haven't looked at PB3 but
if it offers that and PB2 wire compatibility than great.


> On Feb 2, 2016, at 12:16 PM, Enis Söztutar <enis.soz@gmail.com> wrote:
> 
> bq. java is second class in flatbuffers).
> This is an Android project, no? I thought Java is the primary target.
> 
>> On Tue, Feb 2, 2016 at 12:14 PM, Stack <stack@duboce.net> wrote:
>> 
>>> On Tue, Feb 2, 2016 at 11:47 AM, Enis Söztutar <enis.soz@gmail.com> wrote:
>>> 
>>> BTW, we should also be looking at https://google.github.io/flatbuffers/
>> or
>>> https://capnproto.org/ for serialization as an option. The idea is to
>> not
>>> allocate objects and prevent allocations altogether.
>> Agree (watching capnproto w/ a while, the java implementation has stalled.
>> java is second class in flatbuffers).
>> St.Ack
>> 
>> 
>> 
>>> We are allocating PB objects for every Get / Put, then we allocate our
>> Get
>>> / Put objects. At least we can save on 1.
>>> 
>>> Although, just switching our serialization format again will be a huge
>>> undertaking with obvious wire-incompatibility issues. If PB3 or 2.x gives
>>> us what we want in terms of preventing big byte[] allocations, we would
>>> gain regardless.
>>> 
>>> Enis
>>> 
>>>> On Tue, Feb 2, 2016 at 11:41 AM, Enis Söztutar <enis.soz@gmail.com>
>>> wrote:
>>> 
>>>> Google guys over at
>>>> https://github.com/grpc/grpc-java/issues/1054#issuecomment-147295224
>> are
>>>> saying that CIS changes may be coming to 2.x from what I understand. If
>>> so,
>>>> our life would be easier. Even so, I'm 100% sure we have to do shading
>>>> since Hadoop will not change it's PB dependency anytime soon.
>>>> 
>>>> We have to do this before doing shading:
>>>> https://issues.apache.org/jira/browse/HBASE-15174
>>>> 
>>>> Enis
>>>> 
>>>>> On Tue, Feb 2, 2016 at 8:15 AM, Stack <stack@duboce.net> wrote:
>>>>> 
>>>>> Thanks Duo. If proto3 had what we wanted, you are suggesting we might
>>> move
>>>>> to proto3 setting it to do proto2 support and shade it so we don't
>> clash
>>>>> with other includes of pb?
>>>>> 
>>>>> Regards Anoop comment, the note on the end of this issue looks
>> promising
>>>>> but I don't know when it'd see the light of day:
>>>>> https://github.com/grpc/grpc-java/issues/1054#issuecomment-147295224
>>>>> 
>>>>> St.Ack
>>>>> 
>>>>> 
>>>>> On Mon, Feb 1, 2016 at 10:49 PM, Anoop John <anoop.hbase@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> UnsafeByteStrings - This may help us to avoid copy even with out
our
>>>>>> HBaseZeroCopyByteString stuff.  But with a DirectByteBuffer, it has
>> to
>>>>> copy
>>>>>> data to onheap byte[].   We even want a DBB backing !
>>>>>> 
>>>>>> -Anoop-
>>>>>> 
>>>>>>> On Tue, Feb 2, 2016 at 12:07 PM, 张铎 <palomino219@gmail.com>
wrote:
>>>>>>> 
>>>>>>> https://groups.google.com/forum/#!topic/protobuf/wAqvtPLBsE8
>>>>>>> 
>>>>>>> PB2 and PB3 are wire compatible, and of course, protobuf-java
is
>> not
>>>>>>> compatible so dependency will be a problem... But I think the
>> shaded
>>>>>> client
>>>>>>> and server can solve the problem?
>>>>>>> 
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> 2016-02-02 14:27 GMT+08:00 Stack <stack@duboce.net>:
>>>>>>> 
>>>>>>>> We are running into a few issues with protobufs.
>>>>>>>> 
>>>>>>>> + PB always copies all data before making a Message. This
>>> generates
>>>>>>> garbage
>>>>>>>> unnecessarily.
>>>>>>>> + CodedInputStream does not support ByteBuffers in 2.5. In
2.6
>> it
>>>>> does
>>>>>>> but
>>>>>>>> again, copies the data out of the BB always; this is especially
>>>>> painful
>>>>>>>> when the BB is a DBB with its data offheap and intent is
to keep
>>>>> data
>>>>>>>> offheap.
>>>>>>>> 
>>>>>>>> There are other issues. CIS allocates 4k buffers regardless
(See
>>>>>>>> HBASE-15177).
>>>>>>>> And then there was the HBaseZeroCopyByteString fun and games
we
>>> had
>>>>> a
>>>>>>> while
>>>>>>>> back.
>>>>>>>> 
>>>>>>>> 3.0 PB adds UnsafeByteStrings so can do zero copy. Thats
good.
>> But
>>>>> PB3
>>>>>> is
>>>>>>>> incompatible with PB2 (or at least, it looks like PB2 clients
>>> can't
>>>>>> talk
>>>>>>> to
>>>>>>>> PB3 [1]).
>>>>>>>> 
>>>>>>>> There is javanano protobufs. All is open access, but it too
>> looks
>>>>>>> different
>>>>>>>> to PB2 (i've not tried it).
>>>>>>>> 
>>>>>>>> Protostuff seems really quiet these times [2].
>>>>>>>> 
>>>>>>>> Fork (and shade)?
>>>>>>>> 
>>>>>>>> Thoughts?
>>>>>>>> 
>>>>>>>> St.Ack
>>>>>>>> 
>>>>>>>> 1. https://github.com/google/protobuf/releases
>>>>>>>> 2. https://groups.google.com/forum/#!forum/protostuff
>> 

Mime
View raw message