ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Kornev <andrewkor...@hotmail.com>
Subject Re: Deserialization of BinaryObject's byte arrays
Date Tue, 07 Nov 2017 19:01:09 GMT
Vladimir,

I appreciate your taking the time to reply. Thank you for sharing your thoughts!

I have to admit I'm still not convinced there is anything inherently wrong about having the
Binary protocol support ByteBuffers as first class citizen. The Binary marshaller already
supports specific collection/map types (such as linked list, array list, hash set etc.) which
all, one could argue, are just "thin wrappers" around "fundamental" sequence/map abstractions.
I don't see why ByteBuffer can't be treated the same way.

Also, your approach, given a relatively simple problem we're trying to solve here, feels very
much over-engineered. As an Ignite user, I'd like to have a *simple* way of deserializing
primitive arrays efficiently. Having to implement a bunch of interfaces to do just that doesn't
fit my (Ignite user's) definition of "simple". Contrary to your statement below, the APIs
you define (or not define) do have a lot to do with user experience.

In any case, here's a new proposal. And this time, it doesn't require modification of the
Binary protocol.

Essentially what we need is just a way to tell the Binary marshaller how array fields are
to be deserialized. It can be done globally (for example, by introducing a new config property
on BinaryConfiguration). Or, it can be done on per binary object instance basis. For example,
we can add a new method to the BinaryObject class (e.g. BinaryObject.withArrayAsBuffer())
to make subsequent calls to BinaryObject.field() return an instance of XxxBuffer class rather
than a raw array (if the underlying field is of course an array).

(Alternatively, we could enhance the BinaryObject class by adding:

Buffer fieldAsBuffer(String fieldName);

method in addition to the existing:

T field(String fieldName);

This would make it possible to read any field (including scalar primitive types and String's)
as an instance of XxxBuffer that wraps the field's raw bytes in the BinaryObject's backing
array.)

Actually I personally like this alternative better -- it's more consistent and symmetric,
and opens up opportunities for further optimization. For example, in some cases (such as equality
checks) I don't need an actual instance of a String -- its character (or even byte) arrays
are sufficient.

The good thing is that neither approach requires the field to have been originally written
(serialized) as XxxBuffer. It could've been just a regular array. The actual type (e.g. byte[]
or ByteBuffer) only matters during deserialization.

Regards
Andrey

________________________________
From: Vladimir Ozerov <vozerov@gridgain.com>
Sent: Tuesday, November 7, 2017 12:25 AM
To: dev@ignite.apache.org
Subject: Re: Deserialization of BinaryObject's byte arrays

Regarding "sefl-imposed rules":
1) Binary protocol is essentially ignite's type system. It is used in many
places: Java, .NET, CPP, SQL (native, JDBC, ODBC), thin client. Of those 5
parties only Java has "ByteBuffer" concept. How other platforms should work
with this type?
2) We compare objects using binary representation
(BinaryArrayIdentityResolver#equals0). So if one user write data as byte
array, and another write the same data as ByteBuffer, resulting objects
will not be equal
These are clear examples on why type system should be as small as possible
- we simply cannot afford extending type system for every minor use case.
Binary protocol is internal thing and has nothing to do with user
experience.

What is worse, ByteBuffer is terrible abstraction as recognized by many
millions of Java users - it is stateful. One cannot read data from it
without changing it's state. And Ignite gives no guarantees that particular
object will be serialized only once per certain operation. One example is
BinaryMetadataCollector - Ignite could go through object twice per
serialization cycle to collect metadata. How are we going to maintain
idempotence in general case?

What could improve UX is public interfaces. Raw example of what we can do:
1) Define generic wrappers over our streams that will give users ability to
read/write byte arrays as they want. E.g.:
interface BinaryArrayWriter {
    write(byte[] src, int offset, int len);
    write(ByteBuffer src, ...);
}

interface BinaryArrayReader {
    int readLength();
    byte[] read();
    void read(byte[] dest, int off, int len);
    void read(ByteBuffer dest);
}

2) Then add new methods to reader/writer/builder. E.g.:
BinaryWriter.writeByteArray(String name, BiConsumer<Object,
BinaryArrayWriter> consumer); // Consumes original field and array writer;
BinaryReader.readByteArray(String name, Function<BinaryArrayReader>
consumer);            // Takes array reader, produces original field

3) Last, introduce field annotation for user convenience:
class MyClass {
    @BinaryArray(writerClass = MyWriterBiConsumer.class, readerClass =
MyReaderFunction.class)
    private ByteBuffer buf;
}

Now all user need to do is to implement write consumer and read function.

Vladimir.


On Tue, Nov 7, 2017 at 10:16 AM, Vladimir Ozerov <vozerov@gridgain.com>
wrote:

> ByteBuffer is not fundamental data type. It is a thin wrapper over array.
> Protocol should contain only fundamental types.
>
> вт, 7 нояб. 2017 г. в 10:05, Andrey Kornev <andrewkornev@hotmail.com>:
>
>> Vladimir,
>>
>> Could you please elaborate as to why adding ByteBuffer to the Binary
>> protocol is “not a good idea”? What is inherently wrong about it?
>>
>> As for your suggestion, while I certainly see your point, it comes with
>> the inconvenience of implementing Binarylizable for every entity type in an
>> application, which makes me wonder if the self imposed rules such as “type
>> system should be kept as small as possible” could be relaxed in the name of
>> greater good and user friendliness. Ignite API is not easy to grok as it is
>> already.
>>
>> Thanks
>> Andrey
>> _____________________________
>> From: Vladimir Ozerov <vozerov@gridgain.com<mailto:vozerov@gridgain.com>>
>> Sent: Monday, November 6, 2017 10:22 PM
>> Subject: Re: Deserialization of BinaryObject's byte arrays
>> To: <dev@ignite.apache.org<mailto:dev@ignite.apache.org>>
>> Cc: Valentin Kulichenko <valentin.kulichenko@gmail.com<mailto:
>> valentin.kulichenko@gmail.com>>
>>
>>
>> Hi Andrey,
>>
>> While we deifnitely need to support byte buffers, adding new type to
>> protocol is not very good idea. Binary protocol is the heart of our
>> ssytem,
>> which is used not oly for plain Java, but for other platforms and SQL. And
>> ByteBuffer is essentially the same byte array, but with different API.
>> What
>> you describe is a special case of byte array read/write. For this reason
>> our type system should be kept as small as possible and it doesn't require
>> new type to be introduced. Instead, we'd better to handle everything on
>> API
>> level only. For example, we can add new methods to reader/writer/builder,
>> which will handle this. E.g.:
>>
>> BinaryWriter.writeByteArray(<some producer here>);
>> BinaryReader.readByteArray(<some consumer here>);
>>
>> Vladimir.
>>
>> On Mon, Nov 6, 2017 at 9:25 PM, Andrey Kornev <andrewkornev@hotmail.com<
>> mailto:andrewkornev@hotmail.com>>
>> wrote:
>>
>> > Will do! Thanks!
>> > ________________________________
>> > From: Valentin Kulichenko <valentin.kulichenko@gmail.com<mailto:
>> valentin.kulichenko@gmail.com>>
>> > Sent: Monday, November 6, 2017 9:17 AM
>> > To: dev@ignite.apache.org<mailto:dev@ignite.apache.org>
>> > Subject: Re: Deserialization of BinaryObject's byte arrays
>> >
>> > This use case was discussed couple of times already, so it seems to be
>> > pretty important for users. And I like the approach suggested by Andrey.
>> >
>> > Andrey, can you create a Jira ticket for this change? Would you like to
>> > contribute it?
>> >
>> > -Val
>> >
>> > On Fri, Nov 3, 2017 at 4:18 PM, Dmitriy Setrakyan <
>> dsetrakyan@apache.org<mailto:dsetrakyan@apache.org>>
>> > wrote:
>> >
>> > > This makes sense to me. Sounds like a useful feature.
>> > >
>> > > Would be nice to hear what others in the community think?
>> > >
>> > > D.
>> > >
>> > > On Fri, Nov 3, 2017 at 1:07 PM, Andrey Kornev <
>> andrewkornev@hotmail.com<mailto:andrewkornev@hotmail.com>>
>> > > wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > We store lots of byte arrays (serialized graph-like data
>> structures) as
>> > > > fields of BinaryObject. Later on, while reading such field,
>> > > > BinaryInputStream implementation creates an on-heap array and copies
>> > the
>> > > > bytes from the BinaryObject's internal backing array to the new
>> array.
>> > > >
>> > > > While in general case it works just fine, in our case, a lot of CPU
>> is
>> > > > spent on copying of the bytes and significant amount garbage is
>> > generated
>> > > > as well.
>> > > >
>> > > > I'd like Ignite Community to consider the following proposal:
>> introduce
>> > > > support for serialization/deserialization of the ByteBuffer class.
A
>> > > > BinaryInputStream implementation would special case ByteBuffer
>> fields
>> > > same
>> > > > way it currently special cases various Java types (collections,
>> dates,
>> > > > timestamps, enums, etc.)
>> > > >
>> > > > The application developer would simply call
>> > > BinaryObjectBuilder.setField(...)
>> > > > passing in an instance of ByteBuffer. During serialization, the
>> > > > ByteBuffer's bytes would simply be copied to the backing array
>> (similar
>> > > to
>> > > > how regular byte arrays are serialized today) and in the binary
>> > object's
>> > > > metadata the field marked as, say, GridBinaryMarshaller.BYTE_
>> BUFFER.
>> > > >
>> > > > During deserialization of the field, instead of allocating a new
>> byte
>> > > > array and transferring the bytes from the BinaryObject's backing
>> array,
>> > > > BinaryInputStream would simply wrap the required sub-array into a
>> > > read-only
>> > > > instance of ByteBuffer using "public static ByteBuffer wrap(byte[]
>> > array,
>> > > > int offset, int length)" and return the instance to the application.
>> > > >
>> > > > This approach doesn't require any array allocation/data copying and
>> is
>> > > > faster and significantly reduces pressure on the garbage collector.
>> The
>> > > > benefits are especially great when application stores significant
>> > amount
>> > > of
>> > > > its data as byte arrays.
>> > > >
>> > > > Clearly, this approach equally applies to arrays of other primitive
>> > > types.
>> > > >
>> > > > A minor complication does arise when dealing with off-heap
>> > BinaryObjects,
>> > > > but nothing that can't solved with a little bit of Java reflection
>> > > wizardry
>> > > > (or by other means).
>> > > >
>> > > > Thanks,
>> > > > Andrey
>> > > >
>> > >
>> >
>>
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message