ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Sapego <isap...@apache.org>
Subject Re: [IMPORTANT] Future of Binary Objects
Date Wed, 21 Nov 2018 09:44:35 GMT
I want to offer several optimizations:

1. If we store fields metadata anyway, and are going to store bitmasks for
null fields, should we also exclude "header" byte from object field? As we
can get field type info from a metadata.

2. If we have subsequent fields of fixed length we can avoid storing offset
to these field, as we can easily calculate these offsets. We can even store
them in metadata to improve performance.

3. If these two optimizations are adopted, it makes sense to mention in docs
that it is highly recommended to write fixed sized types in the beginning
of the
object.

Best Regards,
Igor


On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov <dmekhanikov@gmail.com>
wrote:

> People often ask about possibility to store their data in that format, that
> they use in their applications.
> If you use Avro everywhere in your application, then why not store data in
> the same format in Ignite?
> So, how about making an interface, that would enlist all operations we
> need,
> and use this interface everywhere without relying on any specific
> implementation.
> *BinaryObject* looks like a suitable interface, but the only
> implementation, that you can get from Ignite
> is *BinaryObjectImpl*.
> I think, we should make Ignite extendible and provide capability to specify
> your own data format
> by implementing the corresponding interfaces.
> So, if you like JSONB or Protobuf or whatever else, you could enable a
> module for the corresponding
> format, and use it for storing the data.
>
> Denis
>
> ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev <zaleslaw.sin@gmail.com>:
>
> > I'd like @Vyacheslav Daradur approach.
> >
> > Maybe somebody could have a look at UnsafeRow in Spark
> >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > UnsafeRow is a concrete InternalRow that represents a mutable internal
> > raw-memory (and hence unsafe) binary row format.
> >
> > P.S. If somebody is interested in this apporach, I could share more
> > information
> >
> > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <sergi.vladykin@gmail.com>:
> >
> > > I really like Protobuf format. It is probably not what we need for O(1)
> > > fields access,
> > > but for compact data representation we can derive lots from there.
> > >
> > > Also IMO, restricting field type change is absolutely sane idea.
> > > The correct way to evolve schema in common case is to add new fields
> and
> > > gradually
> > > deprecate the old ones, if you can skip default/null fields in binary
> > > format this approach
> > > will not introduce any noticeable performance/size overhead.
> > >
> > > Sergi
> > >
> > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <daradurvs@gmail.com
> >:
> > >
> > > > I think, one of a possible way to reduce overhead and TCO - SQL
> Scheme
> > > > approach.
> > > >
> > > > That assumes that metadata will be stored separately from serialized
> > > > data to reduce size.
> > > > In this case, the most advantages of Binary Objects like access in
> > > > O(1) and access without deserialization may be achieved.
> > > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <
> vozerov@gridgain.com
> > >
> > > > wrote:
> > > > >
> > > > > Hi Alexey,
> > > > >
> > > > > Binary Objects only.
> > > > >
> > > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > > zaleslaw.sin@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Do we discuss here Core features only or the roadmap for all
> > > > components?
> > > > > >
> > > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> > vozerov@gridgain.com
> > > >:
> > > > > >
> > > > > > > Igniters,
> > > > > > >
> > > > > > > It is very likely that Apache Ignite 3.0 will be released
next
> > > year.
> > > > So
> > > > > > we
> > > > > > > need to start thinking about major product improvements.
I'd
> like
> > > to
> > > > > > start
> > > > > > > with binary objects.
> > > > > > >
> > > > > > > Currently they are one of the main limiting factors for
the
> > > product.
> > > > They
> > > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache
> > Ignite
> > > > > > > comparing to other vendors. They are slow - not suitable
for
> SQL
> > at
> > > > all.
> > > > > > >
> > > > > > > I would like to ask all of you who worked with binary objects
> to
> > > > share
> > > > > > your
> > > > > > > feedback and ideas, so that we understand how they should
look
> > like
> > > > in AI
> > > > > > > 3.0. This is a brain storm - let's accumulate ideas first
and
> > > > minimize
> > > > > > > critics. Then we will work on ideas in separate topics.
> > > > > > >
> > > > > > > 1) Historical background
> > > > > > >
> > > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when
we
> > started
> > > > > > working
> > > > > > > on .NET and CPP clients. During design we had several ideas
in
> > > mind:
> > > > > > > - ability to read object fields in O(1) without deserialization
> > > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > > >
> > > > > > > Since then a number of other concepts were mixed to the
> cocktail:
> > > > > > > - Affinity key fields
> > > > > > > - Strict typing for existing fields (aka metadata)
> > > > > > > - Binary Object as storage format
> > > > > > >
> > > > > > > 2) My proposals
> > > > > > >
> > > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > > Binary Objects are terrible candidates for storage. Too
fat,
> too
> > > > slow.
> > > > > > > Efficient storage typically has <10 bytes overhead per
row (no
> > > > metadata,
> > > > > > no
> > > > > > > length, no hash code, etc), allow supper-fast field access,
> > support
> > > > > > > different string formats (ASCII, UTF-8, etc), support different
> > > > temporal
> > > > > > > types (date, time, timestamp, timestamp with timezone,
etc),
> and
> > > > store
> > > > > > > these types as efficiently as possible.
> > > > > > >
> > > > > > > What we need is to introduce an interface which will convert
a
> > pair
> > > > of
> > > > > > > key-value objects into a row. This row will be used to
store
> data
> > > > and to
> > > > > > > get fields from it. Care about memory consumption, need
SQL and
> > > > strict
> > > > > > > schema - use one format. Need flexibility and prefer key-value
> > > > access -
> > > > > > use
> > > > > > > another format which will store binary objects unchanged
> (current
> > > > > > > behavior).
> > > > > > >
> > > > > > > interface DataRowFormat {
> > > > > > >     DataRow create(Object key, Object value); // primitives
or
> > > binary
> > > > > > > objects
> > > > > > >     DataRowMetadata metadata();
> > > > > > > }
> > > > > > >
> > > > > > > 2.2) Remove affinity field from metadata
> > > > > > > Affinity rules are governed by cache, not type. We should
> remove
> > > > > > > "affintiyFieldName" from metadata.
> > > > > > >
> > > > > > > 2.3) Remove restrictions on changing field type
> > > > > > > I do not know why we did that in the first place. This
> > restriction
> > > > > > prevents
> > > > > > > type evolution and confuses users.
> > > > > > >
> > > > > > > 2.4) Use bitmaps for "null" and default values and for
> > fixed-length
> > > > > > fields,
> > > > > > > put fixed-length fields before variable-length.
> > > > > > > Motivation: to save space.
> > > > > > >
> > > > > > > What else? Please share your ideas.
> > > > > > >
> > > > > > > Vladimir.
> > > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav D.
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message