ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Mekhanikov <dmekhani...@gmail.com>
Subject Re: [IMPORTANT] Future of Binary Objects
Date Wed, 21 Nov 2018 08:37:11 GMT
People often ask about possibility to store their data in that format, that
they use in their applications.
If you use Avro everywhere in your application, then why not store data in
the same format in Ignite?
So, how about making an interface, that would enlist all operations we
need,
and use this interface everywhere without relying on any specific
implementation.
*BinaryObject* looks like a suitable interface, but the only
implementation, that you can get from Ignite
is *BinaryObjectImpl*.
I think, we should make Ignite extendible and provide capability to specify
your own data format
by implementing the corresponding interfaces.
So, if you like JSONB or Protobuf or whatever else, you could enable a
module for the corresponding
format, and use it for storing the data.

Denis

ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev <zaleslaw.sin@gmail.com>:

> I'd like @Vyacheslav Daradur approach.
>
> Maybe somebody could have a look at UnsafeRow in Spark
>
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> UnsafeRow is a concrete InternalRow that represents a mutable internal
> raw-memory (and hence unsafe) binary row format.
>
> P.S. If somebody is interested in this apporach, I could share more
> information
>
> вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <sergi.vladykin@gmail.com>:
>
> > I really like Protobuf format. It is probably not what we need for O(1)
> > fields access,
> > but for compact data representation we can derive lots from there.
> >
> > Also IMO, restricting field type change is absolutely sane idea.
> > The correct way to evolve schema in common case is to add new fields and
> > gradually
> > deprecate the old ones, if you can skip default/null fields in binary
> > format this approach
> > will not introduce any noticeable performance/size overhead.
> >
> > Sergi
> >
> > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <daradurvs@gmail.com>:
> >
> > > I think, one of a possible way to reduce overhead and TCO - SQL Scheme
> > > approach.
> > >
> > > That assumes that metadata will be stored separately from serialized
> > > data to reduce size.
> > > In this case, the most advantages of Binary Objects like access in
> > > O(1) and access without deserialization may be achieved.
> > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <vozerov@gridgain.com
> >
> > > wrote:
> > > >
> > > > Hi Alexey,
> > > >
> > > > Binary Objects only.
> > > >
> > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > zaleslaw.sin@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Do we discuss here Core features only or the roadmap for all
> > > components?
> > > > >
> > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> vozerov@gridgain.com
> > >:
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > It is very likely that Apache Ignite 3.0 will be released next
> > year.
> > > So
> > > > > we
> > > > > > need to start thinking about major product improvements. I'd
like
> > to
> > > > > start
> > > > > > with binary objects.
> > > > > >
> > > > > > Currently they are one of the main limiting factors for the
> > product.
> > > They
> > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache
> Ignite
> > > > > > comparing to other vendors. They are slow - not suitable for
SQL
> at
> > > all.
> > > > > >
> > > > > > I would like to ask all of you who worked with binary objects
to
> > > share
> > > > > your
> > > > > > feedback and ideas, so that we understand how they should look
> like
> > > in AI
> > > > > > 3.0. This is a brain storm - let's accumulate ideas first and
> > > minimize
> > > > > > critics. Then we will work on ideas in separate topics.
> > > > > >
> > > > > > 1) Historical background
> > > > > >
> > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we
> started
> > > > > working
> > > > > > on .NET and CPP clients. During design we had several ideas
in
> > mind:
> > > > > > - ability to read object fields in O(1) without deserialization
> > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > >
> > > > > > Since then a number of other concepts were mixed to the cocktail:
> > > > > > - Affinity key fields
> > > > > > - Strict typing for existing fields (aka metadata)
> > > > > > - Binary Object as storage format
> > > > > >
> > > > > > 2) My proposals
> > > > > >
> > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > Binary Objects are terrible candidates for storage. Too fat,
too
> > > slow.
> > > > > > Efficient storage typically has <10 bytes overhead per row
(no
> > > metadata,
> > > > > no
> > > > > > length, no hash code, etc), allow supper-fast field access,
> support
> > > > > > different string formats (ASCII, UTF-8, etc), support different
> > > temporal
> > > > > > types (date, time, timestamp, timestamp with timezone, etc),
and
> > > store
> > > > > > these types as efficiently as possible.
> > > > > >
> > > > > > What we need is to introduce an interface which will convert
a
> pair
> > > of
> > > > > > key-value objects into a row. This row will be used to store
data
> > > and to
> > > > > > get fields from it. Care about memory consumption, need SQL
and
> > > strict
> > > > > > schema - use one format. Need flexibility and prefer key-value
> > > access -
> > > > > use
> > > > > > another format which will store binary objects unchanged (current
> > > > > > behavior).
> > > > > >
> > > > > > interface DataRowFormat {
> > > > > >     DataRow create(Object key, Object value); // primitives
or
> > binary
> > > > > > objects
> > > > > >     DataRowMetadata metadata();
> > > > > > }
> > > > > >
> > > > > > 2.2) Remove affinity field from metadata
> > > > > > Affinity rules are governed by cache, not type. We should remove
> > > > > > "affintiyFieldName" from metadata.
> > > > > >
> > > > > > 2.3) Remove restrictions on changing field type
> > > > > > I do not know why we did that in the first place. This
> restriction
> > > > > prevents
> > > > > > type evolution and confuses users.
> > > > > >
> > > > > > 2.4) Use bitmaps for "null" and default values and for
> fixed-length
> > > > > fields,
> > > > > > put fixed-length fields before variable-length.
> > > > > > Motivation: to save space.
> > > > > >
> > > > > > What else? Please share your ideas.
> > > > > >
> > > > > > Vladimir.
> > > > > >
> > > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav D.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message