ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vyacheslav Daradur <daradu...@gmail.com>
Subject Re: Data compression in Ignite 2.0
Date Wed, 09 Aug 2017 20:05:55 GMT
Vladimir, thank you for detailed explanation.

I think I've understanded the main idea of described storage compression.

I'll join the new discussion after researching of given material and
comlpetion of varint-optimization [1].

[1] https://issues.apache.org/jira/browse/IGNITE-5097

2017-08-02 15:43 GMT+03:00 Alexey Kuznetsov <akuznetsov@apache.org>:

> Vova,
>
> Finally we back to my initial idea - to look how "big databases compress"
> data :)
>
>
> Just to remind how IBM DB2 do this[1].
>
> [1] http://www.ibm.com/developerworks/data/library/techarticle/dm-
> 1205db210compression/
>
> On Tue, Aug 1, 2017 at 4:15 PM, Vladimir Ozerov <vozerov@gridgain.com>
> wrote:
>
> > Vyacheslav,
> >
> > This is not about my needs, but about the product :-) BinaryObject is a
> > central entity used for both data transfer and data storage. This is both
> > good and bad at the same time.
> >
> > Good thing is that as we optimize binary protocol, we improve both
> network
> > and storage performance at the same time. We have at least 3 things which
> > will be included into the product soon: varint encoding [1], optimized
> > string encoding [2] and null-field optimization [3]. Bad thing is that
> > binary object format is not well suited for data storage optimizations,
> > including compression. For example, one good compression technique is to
> > organize data in column-store format, or to introduce shared "dictionary"
> > with unique values on cache level. In both cases N equal values are not
> > stored N times. Instead, we store one value and N references to it, or
> so.
> > This way 2x-10x compression is possible depending on workload type.
> Binary
> > object protocol with some compression on top of it cannot give such
> > improvement, because it will compress data in individual objects, instead
> > of compressing the whole cache data in a single context.
> >
> > That said, I propose to give up adding compression to BinaryObject. This
> is
> > a dead end. Instead, we should:
> > 1) Optimize protocol itself to be more compact, as described in
> > aforementioned Ignite tickets
> > 2) Start new discussion about storage compression
> >
> > You can read papers of other vendors to get better understanding on
> > possible compression options. E.g. Oracle has a lot of compression
> > techniques, including heat maps, background compression, per-block
> > compression, data dictionaries, etc. [4].
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-5097
> > [2] https://issues.apache.org/jira/browse/IGNITE-5655
> > [3] https://issues.apache.org/jira/browse/IGNITE-3939
> > [4] http://www.oracle.com/technetwork/database/options/
> > compression/advanced-
> > compression-wp-12c-1896128.pdf
> >
> > Vladimir.
> >
> >
>
> --
> Alexey Kuznetsov
>



-- 
Best Regards, Vyacheslav D.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message