ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Ozerov <voze...@gridgain.com>
Subject Re: [IMPORTANT] Future of Binary Objects
Date Wed, 21 Nov 2018 13:18:54 GMT
Pavel,

This could be solved with aforementioned "RowFormat". We will be able to
configure cache as follows: "this is a cache with strict type checks, first
one is A, with fields A1, A2, A3, second is B with fields B1, B2". So it
will be possible to serialize anything into binary object, but when it
comes to real store, exception will be thrown.

Makes sense?

On Wed, Nov 21, 2018 at 3:21 PM Pavel Tupitsyn <ptupitsyn@apache.org> wrote:

> Vladimir,
>
> IMO the issue is that we allow any type of data in the cache (put Person,
> then put int to the same cache).
> Are we going to address this in 3.0 and enforce key/value types according
> to cache configuration?
> This will provide more space for optimizations.
>
> On Wed, Nov 21, 2018 at 3:14 PM Vladimir Ozerov <vozerov@gridgain.com>
> wrote:
>
> > Denis,
> >
> > In theory data conversion could be avoided in certain cases. E.g.
> consider
> > a case of loading data through streamer. We know the cache, we know it's
> > metadata and row format. So instead of doing "user object" -> "binary
> > object" -> "row", we can do "user object" -> "row".
> >
> > On Wed, Nov 21, 2018 at 1:31 PM Denis Mekhanikov <dmekhanikov@gmail.com>
> > wrote:
> >
> > > Vladimir,
> > >
> > > Thank you for the clarification. I didn't see this distinction first.
> > >
> > > I meant using customizable formats for all serialization, not only for
> > > storage.
> > > The idea behind my proposal is to avoid data conversion, when loading
> > data
> > > into Ignite.
> > > It will complicate usage of thin clients though, so I'm not sure, that
> it
> > > will make users happier.
> > >
> > > But anyway, the same approach may be used for storage only.
> > >
> > > Denis
> > >
> > > ср, 21 нояб. 2018 г. в 12:57, Vladimir Ozerov <vozerov@gridgain.com>:
> > >
> > > > Denis,
> > > >
> > > > Could you please clarify - are you talking about storage, e.g. how
> > > objects
> > > > are stored in Ignite, or about serialization as a whole? I'd like to
> > > better
> > > > understand whether the use case you described is relevant to my idea
> of
> > > > splitting binary objects from underlying storage format.
> > > > My vision was that we can use current BinaryObject protocol (with
> > > whatever
> > > > optimizations needed), as a common format for communication between
> > nodes
> > > > and a common serialization protocol. This is very handy because all
> > > > participants (Java, С++, .NET, all sorts of thin clients) are able to
> > > work
> > > > with it. So if I have a "Person" class in Java I can read it in any
> > other
> > > > platform without any additional configuration. But when it comes to
> > > > *storage*, then we may introduce pluggable row format interface which
> > > will
> > > > apply any necessary transformations. So if someone wants to store
> > objects
> > > > in Avro/Protobuf, and ready to configure and implement it (generate
> > > > classes, implementa field extraction logic, etc.) - then just
> implement
> > > > that interface. They key is that this implementation will only be
> > needed
> > > in
> > > > Java, not in a dozen of platform we support.
> > > >
> > > > But when it comes to how to store object in a cache
> > > >
> > > > On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov <
> > dmekhanikov@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > People often ask about possibility to store their data in that
> > format,
> > > > that
> > > > > they use in their applications.
> > > > > If you use Avro everywhere in your application, then why not store
> > data
> > > > in
> > > > > the same format in Ignite?
> > > > > So, how about making an interface, that would enlist all operations
> > we
> > > > > need,
> > > > > and use this interface everywhere without relying on any specific
> > > > > implementation.
> > > > > *BinaryObject* looks like a suitable interface, but the only
> > > > > implementation, that you can get from Ignite
> > > > > is *BinaryObjectImpl*.
> > > > > I think, we should make Ignite extendible and provide capability
to
> > > > specify
> > > > > your own data format
> > > > > by implementing the corresponding interfaces.
> > > > > So, if you like JSONB or Protobuf or whatever else, you could
> enable
> > a
> > > > > module for the corresponding
> > > > > format, and use it for storing the data.
> > > > >
> > > > > Denis
> > > > >
> > > > > ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev <
> > zaleslaw.sin@gmail.com
> > > >:
> > > > >
> > > > > > I'd like @Vyacheslav Daradur approach.
> > > > > >
> > > > > > Maybe somebody could have a look at UnsafeRow in Spark
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > > > > > UnsafeRow is a concrete InternalRow that represents a mutable
> > > internal
> > > > > > raw-memory (and hence unsafe) binary row format.
> > > > > >
> > > > > > P.S. If somebody is interested in this apporach, I could share
> more
> > > > > > information
> > > > > >
> > > > > > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <
> > > sergi.vladykin@gmail.com
> > > > >:
> > > > > >
> > > > > > > I really like Protobuf format. It is probably not what
we need
> > for
> > > > O(1)
> > > > > > > fields access,
> > > > > > > but for compact data representation we can derive lots
from
> > there.
> > > > > > >
> > > > > > > Also IMO, restricting field type change is absolutely sane
> idea.
> > > > > > > The correct way to evolve schema in common case is to add
new
> > > fields
> > > > > and
> > > > > > > gradually
> > > > > > > deprecate the old ones, if you can skip default/null fields
in
> > > binary
> > > > > > > format this approach
> > > > > > > will not introduce any noticeable performance/size overhead.
> > > > > > >
> > > > > > > Sergi
> > > > > > >
> > > > > > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur
<
> > > > daradurvs@gmail.com
> > > > > >:
> > > > > > >
> > > > > > > > I think, one of a possible way to reduce overhead
and TCO -
> SQL
> > > > > Scheme
> > > > > > > > approach.
> > > > > > > >
> > > > > > > > That assumes that metadata will be stored separately
from
> > > > serialized
> > > > > > > > data to reduce size.
> > > > > > > > In this case, the most advantages of Binary Objects
like
> access
> > > in
> > > > > > > > O(1) and access without deserialization may be achieved.
> > > > > > > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <
> > > > > vozerov@gridgain.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi Alexey,
> > > > > > > > >
> > > > > > > > > Binary Objects only.
> > > > > > > > >
> > > > > > > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev
<
> > > > > > > zaleslaw.sin@gmail.com
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Do we discuss here Core features only or
the roadmap for
> > all
> > > > > > > > components?
> > > > > > > > > >
> > > > > > > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir
Ozerov <
> > > > > > vozerov@gridgain.com
> > > > > > > >:
> > > > > > > > > >
> > > > > > > > > > > Igniters,
> > > > > > > > > > >
> > > > > > > > > > > It is very likely that Apache Ignite
3.0 will be
> released
> > > > next
> > > > > > > year.
> > > > > > > > So
> > > > > > > > > > we
> > > > > > > > > > > need to start thinking about major
product
> improvements.
> > > I'd
> > > > > like
> > > > > > > to
> > > > > > > > > > start
> > > > > > > > > > > with binary objects.
> > > > > > > > > > >
> > > > > > > > > > > Currently they are one of the main
limiting factors for
> > the
> > > > > > > product.
> > > > > > > > They
> > > > > > > > > > > are fat - 30+ bytes overhead on average,
high TCO of
> > Apache
> > > > > > Ignite
> > > > > > > > > > > comparing to other vendors. They are
slow - not
> suitable
> > > for
> > > > > SQL
> > > > > > at
> > > > > > > > all.
> > > > > > > > > > >
> > > > > > > > > > > I would like to ask all of you who
worked with binary
> > > objects
> > > > > to
> > > > > > > > share
> > > > > > > > > > your
> > > > > > > > > > > feedback and ideas, so that we understand
how they
> should
> > > > look
> > > > > > like
> > > > > > > > in AI
> > > > > > > > > > > 3.0. This is a brain storm - let's
accumulate ideas
> first
> > > and
> > > > > > > > minimize
> > > > > > > > > > > critics. Then we will work on ideas
in separate topics.
> > > > > > > > > > >
> > > > > > > > > > > 1) Historical background
> > > > > > > > > > >
> > > > > > > > > > > BO were implemented around 2014 (Apache
Ignite 1.5)
> when
> > we
> > > > > > started
> > > > > > > > > > working
> > > > > > > > > > > on .NET and CPP clients. During design
we had several
> > ideas
> > > > in
> > > > > > > mind:
> > > > > > > > > > > - ability to read object fields in
O(1) without
> > > > deserialization
> > > > > > > > > > > - interoperabillty between Java, .NET
and CPP.
> > > > > > > > > > >
> > > > > > > > > > > Since then a number of other concepts
were mixed to the
> > > > > cocktail:
> > > > > > > > > > > - Affinity key fields
> > > > > > > > > > > - Strict typing for existing fields
(aka metadata)
> > > > > > > > > > > - Binary Object as storage format
> > > > > > > > > > >
> > > > > > > > > > > 2) My proposals
> > > > > > > > > > >
> > > > > > > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > > > > > > Binary Objects are terrible candidates
for storage. Too
> > > fat,
> > > > > too
> > > > > > > > slow.
> > > > > > > > > > > Efficient storage typically has <10
bytes overhead per
> > row
> > > > (no
> > > > > > > > metadata,
> > > > > > > > > > no
> > > > > > > > > > > length, no hash code, etc), allow supper-fast
field
> > access,
> > > > > > support
> > > > > > > > > > > different string formats (ASCII, UTF-8,
etc), support
> > > > different
> > > > > > > > temporal
> > > > > > > > > > > types (date, time, timestamp, timestamp
with timezone,
> > > etc),
> > > > > and
> > > > > > > > store
> > > > > > > > > > > these types as efficiently as possible.
> > > > > > > > > > >
> > > > > > > > > > > What we need is to introduce an interface
which will
> > > convert
> > > > a
> > > > > > pair
> > > > > > > > of
> > > > > > > > > > > key-value objects into a row. This
row will be used to
> > > store
> > > > > data
> > > > > > > > and to
> > > > > > > > > > > get fields from it. Care about memory
consumption, need
> > SQL
> > > > and
> > > > > > > > strict
> > > > > > > > > > > schema - use one format. Need flexibility
and prefer
> > > > key-value
> > > > > > > > access -
> > > > > > > > > > use
> > > > > > > > > > > another format which will store binary
objects
> unchanged
> > > > > (current
> > > > > > > > > > > behavior).
> > > > > > > > > > >
> > > > > > > > > > > interface DataRowFormat {
> > > > > > > > > > >     DataRow create(Object key, Object
value); //
> > primitives
> > > > or
> > > > > > > binary
> > > > > > > > > > > objects
> > > > > > > > > > >     DataRowMetadata metadata();
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > 2.2) Remove affinity field from metadata
> > > > > > > > > > > Affinity rules are governed by cache,
not type. We
> should
> > > > > remove
> > > > > > > > > > > "affintiyFieldName" from metadata.
> > > > > > > > > > >
> > > > > > > > > > > 2.3) Remove restrictions on changing
field type
> > > > > > > > > > > I do not know why we did that in the
first place. This
> > > > > > restriction
> > > > > > > > > > prevents
> > > > > > > > > > > type evolution and confuses users.
> > > > > > > > > > >
> > > > > > > > > > > 2.4) Use bitmaps for "null" and default
values and for
> > > > > > fixed-length
> > > > > > > > > > fields,
> > > > > > > > > > > put fixed-length fields before variable-length.
> > > > > > > > > > > Motivation: to save space.
> > > > > > > > > > >
> > > > > > > > > > > What else? Please share your ideas.
> > > > > > > > > > >
> > > > > > > > > > > Vladimir.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards, Vyacheslav D.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message