ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Ozerov <voze...@gridgain.com>
Subject Re: [IMPORTANT] Future of Binary Objects
Date Wed, 21 Nov 2018 08:04:15 GMT
Hi Alexey,

Yes, this looks really similar to Postgres format as welд - bitset, fixed
fields, varlen fields. Most probably we need something similar.

On Wed, Nov 21, 2018 at 10:10 AM Alexey Zinoviev <zaleslaw.sin@gmail.com>
wrote:

> I'd like @Vyacheslav Daradur approach.
>
> Maybe somebody could have a look at UnsafeRow in Spark
>
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> UnsafeRow is a concrete InternalRow that represents a mutable internal
> raw-memory (and hence unsafe) binary row format.
>
> P.S. If somebody is interested in this apporach, I could share more
> information
>
> вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <sergi.vladykin@gmail.com>:
>
> > I really like Protobuf format. It is probably not what we need for O(1)
> > fields access,
> > but for compact data representation we can derive lots from there.
> >
> > Also IMO, restricting field type change is absolutely sane idea.
> > The correct way to evolve schema in common case is to add new fields and
> > gradually
> > deprecate the old ones, if you can skip default/null fields in binary
> > format this approach
> > will not introduce any noticeable performance/size overhead.
> >
> > Sergi
> >
> > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <daradurvs@gmail.com>:
> >
> > > I think, one of a possible way to reduce overhead and TCO - SQL Scheme
> > > approach.
> > >
> > > That assumes that metadata will be stored separately from serialized
> > > data to reduce size.
> > > In this case, the most advantages of Binary Objects like access in
> > > O(1) and access without deserialization may be achieved.
> > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <vozerov@gridgain.com
> >
> > > wrote:
> > > >
> > > > Hi Alexey,
> > > >
> > > > Binary Objects only.
> > > >
> > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > zaleslaw.sin@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Do we discuss here Core features only or the roadmap for all
> > > components?
> > > > >
> > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> vozerov@gridgain.com
> > >:
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > It is very likely that Apache Ignite 3.0 will be released next
> > year.
> > > So
> > > > > we
> > > > > > need to start thinking about major product improvements. I'd
like
> > to
> > > > > start
> > > > > > with binary objects.
> > > > > >
> > > > > > Currently they are one of the main limiting factors for the
> > product.
> > > They
> > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache
> Ignite
> > > > > > comparing to other vendors. They are slow - not suitable for
SQL
> at
> > > all.
> > > > > >
> > > > > > I would like to ask all of you who worked with binary objects
to
> > > share
> > > > > your
> > > > > > feedback and ideas, so that we understand how they should look
> like
> > > in AI
> > > > > > 3.0. This is a brain storm - let's accumulate ideas first and
> > > minimize
> > > > > > critics. Then we will work on ideas in separate topics.
> > > > > >
> > > > > > 1) Historical background
> > > > > >
> > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we
> started
> > > > > working
> > > > > > on .NET and CPP clients. During design we had several ideas
in
> > mind:
> > > > > > - ability to read object fields in O(1) without deserialization
> > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > >
> > > > > > Since then a number of other concepts were mixed to the cocktail:
> > > > > > - Affinity key fields
> > > > > > - Strict typing for existing fields (aka metadata)
> > > > > > - Binary Object as storage format
> > > > > >
> > > > > > 2) My proposals
> > > > > >
> > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > Binary Objects are terrible candidates for storage. Too fat,
too
> > > slow.
> > > > > > Efficient storage typically has <10 bytes overhead per row
(no
> > > metadata,
> > > > > no
> > > > > > length, no hash code, etc), allow supper-fast field access,
> support
> > > > > > different string formats (ASCII, UTF-8, etc), support different
> > > temporal
> > > > > > types (date, time, timestamp, timestamp with timezone, etc),
and
> > > store
> > > > > > these types as efficiently as possible.
> > > > > >
> > > > > > What we need is to introduce an interface which will convert
a
> pair
> > > of
> > > > > > key-value objects into a row. This row will be used to store
data
> > > and to
> > > > > > get fields from it. Care about memory consumption, need SQL
and
> > > strict
> > > > > > schema - use one format. Need flexibility and prefer key-value
> > > access -
> > > > > use
> > > > > > another format which will store binary objects unchanged (current
> > > > > > behavior).
> > > > > >
> > > > > > interface DataRowFormat {
> > > > > >     DataRow create(Object key, Object value); // primitives
or
> > binary
> > > > > > objects
> > > > > >     DataRowMetadata metadata();
> > > > > > }
> > > > > >
> > > > > > 2.2) Remove affinity field from metadata
> > > > > > Affinity rules are governed by cache, not type. We should remove
> > > > > > "affintiyFieldName" from metadata.
> > > > > >
> > > > > > 2.3) Remove restrictions on changing field type
> > > > > > I do not know why we did that in the first place. This
> restriction
> > > > > prevents
> > > > > > type evolution and confuses users.
> > > > > >
> > > > > > 2.4) Use bitmaps for "null" and default values and for
> fixed-length
> > > > > fields,
> > > > > > put fixed-length fields before variable-length.
> > > > > > Motivation: to save space.
> > > > > >
> > > > > > What else? Please share your ideas.
> > > > > >
> > > > > > Vladimir.
> > > > > >
> > > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav D.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message