ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavel Tupitsyn <ptupit...@apache.org>
Subject Re: [IMPORTANT] Future of Binary Objects
Date Wed, 21 Nov 2018 14:06:37 GMT
Makes sense.

I'm trying to grasp this from usability POV.
Having two ways of storing data with different behavior can be confusing.
In .NET we already have this issue with DateTime when if you want SQL, you
get subtly different behavior.

So IMO we should enable strict type checks for all caches, even non-SQL
ones.
Users will be able to evolve types by adding/removing fields, but at least
type id will be fixed.
And for SQL caches you'll get a clear exception like "Field does not exist
in SQL schema: foobar"

On Wed, Nov 21, 2018 at 4:19 PM Vladimir Ozerov <vozerov@gridgain.com>
wrote:

> Pavel,
>
> This could be solved with aforementioned "RowFormat". We will be able to
> configure cache as follows: "this is a cache with strict type checks, first
> one is A, with fields A1, A2, A3, second is B with fields B1, B2". So it
> will be possible to serialize anything into binary object, but when it
> comes to real store, exception will be thrown.
>
> Makes sense?
>
> On Wed, Nov 21, 2018 at 3:21 PM Pavel Tupitsyn <ptupitsyn@apache.org>
> wrote:
>
> > Vladimir,
> >
> > IMO the issue is that we allow any type of data in the cache (put Person,
> > then put int to the same cache).
> > Are we going to address this in 3.0 and enforce key/value types according
> > to cache configuration?
> > This will provide more space for optimizations.
> >
> > On Wed, Nov 21, 2018 at 3:14 PM Vladimir Ozerov <vozerov@gridgain.com>
> > wrote:
> >
> > > Denis,
> > >
> > > In theory data conversion could be avoided in certain cases. E.g.
> > consider
> > > a case of loading data through streamer. We know the cache, we know
> it's
> > > metadata and row format. So instead of doing "user object" -> "binary
> > > object" -> "row", we can do "user object" -> "row".
> > >
> > > On Wed, Nov 21, 2018 at 1:31 PM Denis Mekhanikov <
> dmekhanikov@gmail.com>
> > > wrote:
> > >
> > > > Vladimir,
> > > >
> > > > Thank you for the clarification. I didn't see this distinction first.
> > > >
> > > > I meant using customizable formats for all serialization, not only
> for
> > > > storage.
> > > > The idea behind my proposal is to avoid data conversion, when loading
> > > data
> > > > into Ignite.
> > > > It will complicate usage of thin clients though, so I'm not sure,
> that
> > it
> > > > will make users happier.
> > > >
> > > > But anyway, the same approach may be used for storage only.
> > > >
> > > > Denis
> > > >
> > > > ср, 21 нояб. 2018 г. в 12:57, Vladimir Ozerov <vozerov@gridgain.com
> >:
> > > >
> > > > > Denis,
> > > > >
> > > > > Could you please clarify - are you talking about storage, e.g. how
> > > > objects
> > > > > are stored in Ignite, or about serialization as a whole? I'd like
> to
> > > > better
> > > > > understand whether the use case you described is relevant to my
> idea
> > of
> > > > > splitting binary objects from underlying storage format.
> > > > > My vision was that we can use current BinaryObject protocol (with
> > > > whatever
> > > > > optimizations needed), as a common format for communication between
> > > nodes
> > > > > and a common serialization protocol. This is very handy because all
> > > > > participants (Java, С++, .NET, all sorts of thin clients) are able
> to
> > > > work
> > > > > with it. So if I have a "Person" class in Java I can read it in any
> > > other
> > > > > platform without any additional configuration. But when it comes
to
> > > > > *storage*, then we may introduce pluggable row format interface
> which
> > > > will
> > > > > apply any necessary transformations. So if someone wants to store
> > > objects
> > > > > in Avro/Protobuf, and ready to configure and implement it (generate
> > > > > classes, implementa field extraction logic, etc.) - then just
> > implement
> > > > > that interface. They key is that this implementation will only be
> > > needed
> > > > in
> > > > > Java, not in a dozen of platform we support.
> > > > >
> > > > > But when it comes to how to store object in a cache
> > > > >
> > > > > On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov <
> > > dmekhanikov@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > People often ask about possibility to store their data in that
> > > format,
> > > > > that
> > > > > > they use in their applications.
> > > > > > If you use Avro everywhere in your application, then why not
> store
> > > data
> > > > > in
> > > > > > the same format in Ignite?
> > > > > > So, how about making an interface, that would enlist all
> operations
> > > we
> > > > > > need,
> > > > > > and use this interface everywhere without relying on any specific
> > > > > > implementation.
> > > > > > *BinaryObject* looks like a suitable interface, but the only
> > > > > > implementation, that you can get from Ignite
> > > > > > is *BinaryObjectImpl*.
> > > > > > I think, we should make Ignite extendible and provide capability
> to
> > > > > specify
> > > > > > your own data format
> > > > > > by implementing the corresponding interfaces.
> > > > > > So, if you like JSONB or Protobuf or whatever else, you could
> > enable
> > > a
> > > > > > module for the corresponding
> > > > > > format, and use it for storing the data.
> > > > > >
> > > > > > Denis
> > > > > >
> > > > > > ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev <
> > > zaleslaw.sin@gmail.com
> > > > >:
> > > > > >
> > > > > > > I'd like @Vyacheslav Daradur approach.
> > > > > > >
> > > > > > > Maybe somebody could have a look at UnsafeRow in Spark
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > > > > > > UnsafeRow is a concrete InternalRow that represents a mutable
> > > > internal
> > > > > > > raw-memory (and hence unsafe) binary row format.
> > > > > > >
> > > > > > > P.S. If somebody is interested in this apporach, I could
share
> > more
> > > > > > > information
> > > > > > >
> > > > > > > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <
> > > > sergi.vladykin@gmail.com
> > > > > >:
> > > > > > >
> > > > > > > > I really like Protobuf format. It is probably not
what we
> need
> > > for
> > > > > O(1)
> > > > > > > > fields access,
> > > > > > > > but for compact data representation we can derive
lots from
> > > there.
> > > > > > > >
> > > > > > > > Also IMO, restricting field type change is absolutely
sane
> > idea.
> > > > > > > > The correct way to evolve schema in common case is
to add new
> > > > fields
> > > > > > and
> > > > > > > > gradually
> > > > > > > > deprecate the old ones, if you can skip default/null
fields
> in
> > > > binary
> > > > > > > > format this approach
> > > > > > > > will not introduce any noticeable performance/size
overhead.
> > > > > > > >
> > > > > > > > Sergi
> > > > > > > >
> > > > > > > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur
<
> > > > > daradurvs@gmail.com
> > > > > > >:
> > > > > > > >
> > > > > > > > > I think, one of a possible way to reduce overhead
and TCO -
> > SQL
> > > > > > Scheme
> > > > > > > > > approach.
> > > > > > > > >
> > > > > > > > > That assumes that metadata will be stored separately
from
> > > > > serialized
> > > > > > > > > data to reduce size.
> > > > > > > > > In this case, the most advantages of Binary Objects
like
> > access
> > > > in
> > > > > > > > > O(1) and access without deserialization may be
achieved.
> > > > > > > > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov
<
> > > > > > vozerov@gridgain.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Alexey,
> > > > > > > > > >
> > > > > > > > > > Binary Objects only.
> > > > > > > > > >
> > > > > > > > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey
Zinoviev <
> > > > > > > > zaleslaw.sin@gmail.com
> > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Do we discuss here Core features only
or the roadmap
> for
> > > all
> > > > > > > > > components?
> > > > > > > > > > >
> > > > > > > > > > > вт, 20 нояб. 2018 г. в 10:05,
Vladimir Ozerov <
> > > > > > > vozerov@gridgain.com
> > > > > > > > >:
> > > > > > > > > > >
> > > > > > > > > > > > Igniters,
> > > > > > > > > > > >
> > > > > > > > > > > > It is very likely that Apache
Ignite 3.0 will be
> > released
> > > > > next
> > > > > > > > year.
> > > > > > > > > So
> > > > > > > > > > > we
> > > > > > > > > > > > need to start thinking about major
product
> > improvements.
> > > > I'd
> > > > > > like
> > > > > > > > to
> > > > > > > > > > > start
> > > > > > > > > > > > with binary objects.
> > > > > > > > > > > >
> > > > > > > > > > > > Currently they are one of the
main limiting factors
> for
> > > the
> > > > > > > > product.
> > > > > > > > > They
> > > > > > > > > > > > are fat - 30+ bytes overhead on
average, high TCO of
> > > Apache
> > > > > > > Ignite
> > > > > > > > > > > > comparing to other vendors. They
are slow - not
> > suitable
> > > > for
> > > > > > SQL
> > > > > > > at
> > > > > > > > > all.
> > > > > > > > > > > >
> > > > > > > > > > > > I would like to ask all of you
who worked with binary
> > > > objects
> > > > > > to
> > > > > > > > > share
> > > > > > > > > > > your
> > > > > > > > > > > > feedback and ideas, so that we
understand how they
> > should
> > > > > look
> > > > > > > like
> > > > > > > > > in AI
> > > > > > > > > > > > 3.0. This is a brain storm - let's
accumulate ideas
> > first
> > > > and
> > > > > > > > > minimize
> > > > > > > > > > > > critics. Then we will work on
ideas in separate
> topics.
> > > > > > > > > > > >
> > > > > > > > > > > > 1) Historical background
> > > > > > > > > > > >
> > > > > > > > > > > > BO were implemented around 2014
(Apache Ignite 1.5)
> > when
> > > we
> > > > > > > started
> > > > > > > > > > > working
> > > > > > > > > > > > on .NET and CPP clients. During
design we had several
> > > ideas
> > > > > in
> > > > > > > > mind:
> > > > > > > > > > > > - ability to read object fields
in O(1) without
> > > > > deserialization
> > > > > > > > > > > > - interoperabillty between Java,
.NET and CPP.
> > > > > > > > > > > >
> > > > > > > > > > > > Since then a number of other concepts
were mixed to
> the
> > > > > > cocktail:
> > > > > > > > > > > > - Affinity key fields
> > > > > > > > > > > > - Strict typing for existing fields
(aka metadata)
> > > > > > > > > > > > - Binary Object as storage format
> > > > > > > > > > > >
> > > > > > > > > > > > 2) My proposals
> > > > > > > > > > > >
> > > > > > > > > > > > 2.1) Introduce "Data Row Format"
interface
> > > > > > > > > > > > Binary Objects are terrible candidates
for storage.
> Too
> > > > fat,
> > > > > > too
> > > > > > > > > slow.
> > > > > > > > > > > > Efficient storage typically has
<10 bytes overhead
> per
> > > row
> > > > > (no
> > > > > > > > > metadata,
> > > > > > > > > > > no
> > > > > > > > > > > > length, no hash code, etc), allow
supper-fast field
> > > access,
> > > > > > > support
> > > > > > > > > > > > different string formats (ASCII,
UTF-8, etc), support
> > > > > different
> > > > > > > > > temporal
> > > > > > > > > > > > types (date, time, timestamp,
timestamp with
> timezone,
> > > > etc),
> > > > > > and
> > > > > > > > > store
> > > > > > > > > > > > these types as efficiently as
possible.
> > > > > > > > > > > >
> > > > > > > > > > > > What we need is to introduce an
interface which will
> > > > convert
> > > > > a
> > > > > > > pair
> > > > > > > > > of
> > > > > > > > > > > > key-value objects into a row.
This row will be used
> to
> > > > store
> > > > > > data
> > > > > > > > > and to
> > > > > > > > > > > > get fields from it. Care about
memory consumption,
> need
> > > SQL
> > > > > and
> > > > > > > > > strict
> > > > > > > > > > > > schema - use one format. Need
flexibility and prefer
> > > > > key-value
> > > > > > > > > access -
> > > > > > > > > > > use
> > > > > > > > > > > > another format which will store
binary objects
> > unchanged
> > > > > > (current
> > > > > > > > > > > > behavior).
> > > > > > > > > > > >
> > > > > > > > > > > > interface DataRowFormat {
> > > > > > > > > > > >     DataRow create(Object key,
Object value); //
> > > primitives
> > > > > or
> > > > > > > > binary
> > > > > > > > > > > > objects
> > > > > > > > > > > >     DataRowMetadata metadata();
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > 2.2) Remove affinity field from
metadata
> > > > > > > > > > > > Affinity rules are governed by
cache, not type. We
> > should
> > > > > > remove
> > > > > > > > > > > > "affintiyFieldName" from metadata.
> > > > > > > > > > > >
> > > > > > > > > > > > 2.3) Remove restrictions on changing
field type
> > > > > > > > > > > > I do not know why we did that
in the first place.
> This
> > > > > > > restriction
> > > > > > > > > > > prevents
> > > > > > > > > > > > type evolution and confuses users.
> > > > > > > > > > > >
> > > > > > > > > > > > 2.4) Use bitmaps for "null" and
default values and
> for
> > > > > > > fixed-length
> > > > > > > > > > > fields,
> > > > > > > > > > > > put fixed-length fields before
variable-length.
> > > > > > > > > > > > Motivation: to save space.
> > > > > > > > > > > >
> > > > > > > > > > > > What else? Please share your ideas.
> > > > > > > > > > > >
> > > > > > > > > > > > Vladimir.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best Regards, Vyacheslav D.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message