ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Ozerov <voze...@gridgain.com>
Subject [IMPORTANT] Future of Binary Objects
Date Tue, 20 Nov 2018 07:05:03 GMT
Igniters,

It is very likely that Apache Ignite 3.0 will be released next year. So we
need to start thinking about major product improvements. I'd like to start
with binary objects.

Currently they are one of the main limiting factors for the product. They
are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
comparing to other vendors. They are slow - not suitable for SQL at all.

I would like to ask all of you who worked with binary objects to share your
feedback and ideas, so that we understand how they should look like in AI
3.0. This is a brain storm - let's accumulate ideas first and minimize
critics. Then we will work on ideas in separate topics.

1) Historical background

BO were implemented around 2014 (Apache Ignite 1.5) when we started working
on .NET and CPP clients. During design we had several ideas in mind:
- ability to read object fields in O(1) without deserialization
- interoperabillty between Java, .NET and CPP.

Since then a number of other concepts were mixed to the cocktail:
- Affinity key fields
- Strict typing for existing fields (aka metadata)
- Binary Object as storage format

2) My proposals

2.1) Introduce "Data Row Format" interface
Binary Objects are terrible candidates for storage. Too fat, too slow.
Efficient storage typically has <10 bytes overhead per row (no metadata, no
length, no hash code, etc), allow supper-fast field access, support
different string formats (ASCII, UTF-8, etc), support different temporal
types (date, time, timestamp, timestamp with timezone, etc), and store
these types as efficiently as possible.

What we need is to introduce an interface which will convert a pair of
key-value objects into a row. This row will be used to store data and to
get fields from it. Care about memory consumption, need SQL and strict
schema - use one format. Need flexibility and prefer key-value access - use
another format which will store binary objects unchanged (current behavior).

interface DataRowFormat {
    DataRow create(Object key, Object value); // primitives or binary
objects
    DataRowMetadata metadata();
}

2.2) Remove affinity field from metadata
Affinity rules are governed by cache, not type. We should remove
"affintiyFieldName" from metadata.

2.3) Remove restrictions on changing field type
I do not know why we did that in the first place. This restriction prevents
type evolution and confuses users.

2.4) Use bitmaps for "null" and default values and for fixed-length fields,
put fixed-length fields before variable-length.
Motivation: to save space.

What else? Please share your ideas.

Vladimir.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message