ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vyacheslav Daradur <daradu...@gmail.com>
Subject Re: Data compression in Ignite 2.0
Date Fri, 25 Aug 2017 11:51:52 GMT
Hi, should I close the initial ticket [1] as "Won't Fix" and add link to
the new discusion about storage compression [2] in comments?

[1] https://issues.apache.org/jira/browse/IGNITE-3592
[2]
http://apache-ignite-developers.2346864.n4.nabble.com/Data-compression-in-Ignite-td20679.html

2017-08-09 23:05 GMT+03:00 Vyacheslav Daradur <daradurvs@gmail.com>:

> Vladimir, thank you for detailed explanation.
>
> I think I've understanded the main idea of described storage compression.
>
> I'll join the new discussion after researching of given material and
> comlpetion of varint-optimization [1].
>
> [1] https://issues.apache.org/jira/browse/IGNITE-5097
>
> 2017-08-02 15:43 GMT+03:00 Alexey Kuznetsov <akuznetsov@apache.org>:
>
>> Vova,
>>
>> Finally we back to my initial idea - to look how "big databases compress"
>> data :)
>>
>>
>> Just to remind how IBM DB2 do this[1].
>>
>> [1] http://www.ibm.com/developerworks/data/library/techarticle/dm-
>> 1205db210compression/
>> <http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/>
>>
>> On Tue, Aug 1, 2017 at 4:15 PM, Vladimir Ozerov <vozerov@gridgain.com>
>> wrote:
>>
>> > Vyacheslav,
>> >
>> > This is not about my needs, but about the product :-) BinaryObject is a
>> > central entity used for both data transfer and data storage. This is
>> both
>> > good and bad at the same time.
>> >
>> > Good thing is that as we optimize binary protocol, we improve both
>> network
>> > and storage performance at the same time. We have at least 3 things
>> which
>> > will be included into the product soon: varint encoding [1], optimized
>> > string encoding [2] and null-field optimization [3]. Bad thing is that
>> > binary object format is not well suited for data storage optimizations,
>> > including compression. For example, one good compression technique is to
>> > organize data in column-store format, or to introduce shared
>> "dictionary"
>> > with unique values on cache level. In both cases N equal values are not
>> > stored N times. Instead, we store one value and N references to it, or
>> so.
>> > This way 2x-10x compression is possible depending on workload type.
>> Binary
>> > object protocol with some compression on top of it cannot give such
>> > improvement, because it will compress data in individual objects,
>> instead
>> > of compressing the whole cache data in a single context.
>> >
>> > That said, I propose to give up adding compression to BinaryObject.
>> This is
>> > a dead end. Instead, we should:
>> > 1) Optimize protocol itself to be more compact, as described in
>> > aforementioned Ignite tickets
>> > 2) Start new discussion about storage compression
>> >
>> > You can read papers of other vendors to get better understanding on
>> > possible compression options. E.g. Oracle has a lot of compression
>> > techniques, including heat maps, background compression, per-block
>> > compression, data dictionaries, etc. [4].
>> >
>> > [1] https://issues.apache.org/jira/browse/IGNITE-5097
>> > [2] https://issues.apache.org/jira/browse/IGNITE-5655
>> > [3] https://issues.apache.org/jira/browse/IGNITE-3939
>> > [4] http://www.oracle.com/technetwork/database/options/
>> > compression/advanced-
>> > compression-wp-12c-1896128.pdf
>> >
>> > Vladimir.
>> >
>> >
>>
>> --
>> Alexey Kuznetsov
>>
>
>
>
> --
> Best Regards, Vyacheslav D.
>



-- 
Best Regards, Vyacheslav D.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message