ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Setrakyan <dsetrak...@apache.org>
Subject Re: BinaryObject pros/cons
Date Tue, 01 Nov 2016 00:11:27 GMT
In my opinion, writing nulls or default values on the wire or in-memory is
plain wasteful. I agree with Vladimir that schema should be constant, but
internally we should not store the default values at all.

It sounds like a relatively simple task to implement. Do we have a ticket
for it?

D.

On Mon, Oct 31, 2016 at 1:00 PM, Vladimir Ozerov <vozerov@gridgain.com>
wrote:

> Igor,
>
> Good catch. Probably some MAX value could help us here.
>
> On Mon, Oct 31, 2016 at 9:17 PM, Igor Sapego <isapego@gridgain.com> wrote:
>
> > Valentin,
> >
> > -1 was just an example. I've checked - currently we use all possible
> range
> > of offset values.
> > So if we are going to use suggested approach then we need to reserve some
> > value and
> > adjust serialization/deserialization algorithms.
> >
> > Best Regards,
> > Igor
> >
> > On Mon, Oct 31, 2016 at 8:46 PM, Valentin Kulichenko <
> > valentin.kulichenko@gmail.com> wrote:
> >
> > > Makes sense to me, but not sure about -1 in particular. Is this offset
> > > relative to object start position? What values can it have?
> > >
> > > -Val
> > >
> > > On Mon, Oct 31, 2016 at 10:38 AM, Igor Sapego <isapego@gridgain.com>
> > > wrote:
> > >
> > >> Vladimir,
> > >>
> > >> How about some reserved value? I.e -1 offset means a default/null
> value
> > >> should be used?
> > >>
> > >> Best Regards,
> > >> Igor
> > >>
> > >> On Mon, Oct 31, 2016 at 5:05 PM, Vladimir Ozerov <
> vozerov@gridgain.com>
> > >> wrote:
> > >>
> > >>> Valya,
> > >>>
> > >>> Do you have any ideas how to implement this? We write field offsets
> in
> > >>> the
> > >>> footer. If field is not written, then what should be used for its
> > offset?
> > >>>
> > >>> On Mon, Oct 31, 2016 at 4:56 PM, Valentin Kulichenko <
> > >>> valentin.kulichenko@gmail.com> wrote:
> > >>>
> > >>> > Vladimir,
> > >>> >
> > >>> > These are good points, but I'm not suggesting to change the schema.
> > If
> > >>> one
> > >>> > writes five fields, the schema should have five fields in any
case,
> > >>> > regardless of values. I only suggest to change the internal
> > >>> representation
> > >>> > of the object and do not save fields with default values in the
> byte
> > >>> array
> > >>> > as we don't really need them there.
> > >>> >
> > >>> > -Val
> > >>> >
> > >>> > On Sun, Oct 30, 2016 at 12:24 PM, Vladimir Ozerov <
> > >>> vozerov@gridgain.com>
> > >>> > wrote:
> > >>> >
> > >>> >> Valya,
> > >>> >>
> > >>> >> I have several concerns:
> > >>> >> 1) Correctness: hasField() will not work properly. But probably
we
> > can
> > >>> >> fix that by adding this info to schema.
> > >>> >> 2) Performance: we have lots optimizations which depend on
either
> > >>> >> "stable" object schema, or low number of schemas. We will
> > effectively
> > >>> turn
> > >>> >> them off.
> > >>> >> But what concerns me even more, is that we may end up in enormous
> > >>> number
> > >>> >> of schemas. E.g. consider an object with 10 number fields.
If all
> > >>> fields
> > >>> >> could be zero, we may end up in something like 2^10 schemas.
> > >>> >>
> > >>> >> Vladimir.
> > >>> >>
> > >>> >> 29 окт. 2016 г. 0:37 пользователь "Valentin
Kulichenko" <
> > >>> >> valentin.kulichenko@gmail.com> написал:
> > >>> >>
> > >>> >> Vova,
> > >>> >>>
> > >>> >>> Why do we need to write zeros and nulls in the first place?
> What's
> > >>> the
> > >>> >>> value of having them in the byte array?
> > >>> >>>
> > >>> >>> -Val
> > >>> >>>
> > >>> >>> On Fri, Oct 28, 2016 at 1:18 AM, Vladimir Ozerov <
> > >>> vozerov@gridgain.com>
> > >>> >>> wrote:
> > >>> >>>
> > >>> >>>> Valya,
> > >>> >>>>
> > >>> >>>> Currently null value is written as one byte, while
zero value of
> > >>> long
> > >>> >>>> type is written as 9 bytes. I want to improve that
and write
> zeros
> > >>> as one
> > >>> >>>> byte as well.
> > >>> >>>>
> > >>> >>>> As per var-length encoding, I am strongly against
it. It saves
> IO
> > >>> and
> > >>> >>>> memory at the cost of CPU. If we encode numbers in
this way we
> > will
> > >>> >>>> slowdown SQL (which is already not very fast, to be
honest).
> > Because
> > >>> >>>> instead of a single read memory read, we will have
to perform
> > >>> multiple
> > >>> >>>> reads and then apply some mechanics to restore original
value.
> We
> > >>> already
> > >>> >>>> have such problem with Strings - Java stores them
as UTF-16, but
> > we
> > >>> encode
> > >>> >>>> them as UTF-8. As a result every read of a string
field in SQL
> > >>> results in
> > >>> >>>> decoding overhead.
> > >>> >>>>
> > >>> >>>> Vladimir.
> > >>> >>>>
> > >>> >>>> On Fri, Oct 28, 2016 at 6:07 AM, Valentin Kulichenko
<
> > >>> >>>> valentin.kulichenko@gmail.com> wrote:
> > >>> >>>>
> > >>> >>>>> Cross-posting this to dev list.
> > >>> >>>>>
> > >>> >>>>> Vladimir,
> > >>> >>>>>
> > >>> >>>>> To be honest, I don't see much difference between
null values
> for
> > >>> >>>>> objects and zero values for primitives. From BinaryObject
> > semantics
> > >>> >>>>> standpoint, both are default values for corresponding
types.
> > These
> > >>> values
> > >>> >>>>> will be returned from the BinaryObject.field()
method
> regardless
> > >>> of whether
> > >>> >>>>> we actually save then in the byte array or not.
Having said
> that,
> > >>> why don't
> > >>> >>>>> we just skip them during write?
> > >>> >>>>>
> > >>> >>>>> You optimization will be still useful though,
because there are
> > >>> often
> > >>> >>>>> a lot of ints and longs that are not zeros, but
still small and
> > >>> can fit 1-2
> > >>> >>>>> bytes. We already added such compaction in direct
message
> > >>> marshaling and it
> > >>> >>>>> reduced overall traffic by around 30%.
> > >>> >>>>>
> > >>> >>>>> -Val
> > >>> >>>>>
> > >>> >>>>>
> > >>> >>>>> On Thu, Oct 27, 2016 at 2:21 PM, Vladimir Ozerov
<
> > >>> vozerov@gridgain.com
> > >>> >>>>> > wrote:
> > >>> >>>>>
> > >>> >>>>>> Hi,
> > >>> >>>>>>
> > >>> >>>>>> I am not very concerned with null fields overhead,
because
> > >>> usually it
> > >>> >>>>>> won't be significant. However, there is a
problem with zeros.
> > >>> User object
> > >>> >>>>>> might have lots of int/long zeros, this is
not uncommon. And
> > each
> > >>> zero will
> > >>> >>>>>> consume 4-8 additional bytes. We probably
will implement
> special
> > >>> >>>>>> optimization which will write such fields
in special compact
> > >>> format.
> > >>> >>>>>>
> > >>> >>>>>> Vladimir.
> > >>> >>>>>>
> > >>> >>>>>> On Thu, Oct 27, 2016 at 10:55 PM, vkulichenko
<
> > >>> >>>>>> valentin.kulichenko@gmail.com> wrote:
> > >>> >>>>>>
> > >>> >>>>>>> Hi,
> > >>> >>>>>>>
> > >>> >>>>>>> Yes, null values consume memory. I believe
this can be
> > optimized,
> > >>> >>>>>>> but I
> > >>> >>>>>>> haven't seen issues with this so far.
Unless you have
> hundreds
> > of
> > >>> >>>>>>> fields
> > >>> >>>>>>> most of which are nulls (very rare case),
the overhead is
> > >>> minimal.
> > >>> >>>>>>>
> > >>> >>>>>>> -Val
> > >>> >>>>>>>
> > >>> >>>>>>>
> > >>> >>>>>>>
> > >>> >>>>>>> --
> > >>> >>>>>>> View this message in context: http://apache-ignite-users.705
> > >>> >>>>>>> 18.x6.nabble.com/BinaryObject-pros-cons-tp8541p8563.html
> > >>> >>>>>>> Sent from the Apache Ignite Users mailing
list archive at
> > >>> Nabble.com.
> > >>> >>>>>>>
> > >>> >>>>>>
> > >>> >>>>>>
> > >>> >>>>>
> > >>> >>>>
> > >>> >>>
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message