ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Ozerov <voze...@gridgain.com>
Subject Re: Data compression in Ignite
Date Fri, 10 Nov 2017 17:02:53 GMT
Dmitry,

Ignite is used by a variety of applications. Some models I saw were made
completely of stirngs. Others - of longs and decimals, etc.. It is
impossible to either prove or disprove what is the dominant data type. My
position is based on experience with Ignite users and approaches used in
other databases.

Strings are more complex because you approach assumes that there is a
common dictionary with strings, and reference to these strings from data
pages. As soon as you have cross-page references, you are in trobule,
because you need to maintain that dictionary. WIth page based approach we
agreed previously, the dictionary is generic (i.e. it can compress not only
strings, but any byte sequence), and is located inside the page, meaning
that all you need to maintain this dictionary is page lock.

On Fri, Nov 10, 2017 at 7:02 PM, Dmitry Pavlov <dpavlov.spb@gmail.com>
wrote:

> Hi Vladimir,
>
> To my experience string is often used data type in business applications
> and moreover, indexed.
> > String type doesn't dominate in user models
> what is the basis of this assumption?
>
> Could you explain why String is more complex than byte[] compression. It
> seems they both requires dictionaries.
>
> Sincerely,
> Dmitriy Pavlov
>
> пт, 10 нояб. 2017 г. в 18:57, Vladimir Ozerov <vozerov@gridgain.com>:
>
> > This would require shared dictionary, which is complex to maintain. We
> > evaluated this option, but rejected due to complexity. Another important
> > thing is that String type doesn't dominate in user models, so I do not
> see
> > why it should be a kind of special case.
> >
> > пт, 10 нояб. 2017 г. в 18:45, Dmitry Pavlov <dpavlov.spb@gmail.com>:
> >
> > > Vladimir,
> > >
> > > orientation on string will also allow us to deduplicate strings in
> > objects
> > > during unmarshalling from page into heap.
> > >
> > > Moreover, this can be first simple step of implementating more complex
> > > algorithm.
> > >
> > > Sincerely,
> > > Dmitriy Pavlov
> > >
> > > пт, 10 нояб. 2017 г. в 18:19, Vladimir Ozerov <vozerov@gridgain.com>:
> > >
> > > > Dmitry,
> > > >
> > > > What we've discussed so far in this topic is essentially the same
> > > concept.
> > > > We will deduplicate same byte sequences on page level.
> > > >
> > > > On Fri, Nov 10, 2017 at 6:10 PM, Dmitry Pavlov <
> dpavlov.spb@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Igniters,
> > > > >
> > > > > What do you think about implementing analogue of Java G1 collector
> > > featue
> > > > > 'String deduplication': -XX:+UseG1GC -XX:+UseStringDeduplication
> > > > >
> > > > > Most of business application has almost all objects of type String.
> > As
> > > > > result char[] array is often on top of heap usage. To reduce
> > > consumption
> > > > by
> > > > > duplicates G1 collector in background identifies and deduplicates
> > > strings
> > > > > having equal array into one instance (as String is immutable).
> > > > > Unfortunately we can’t reuse collector’s feature as Ignite stores
> > data
> > > > > off-heap.
> > > > >
> > > > > What if we consider implementing same deduplication feature for
> > Ignite
> > > > > Durable Memory?
> > > > >
> > > > > Sincerely,
> > > > > Dmitry Pavlov
> > > > >
> > > > >
> > > > > ср, 18 окт. 2017 г. в 18:52, daradurvs <daradurvs@gmail.com>:
> > > > >
> > > > > > Hi, Igniters!
> > > > > >
> > > > > > Are there any results of researching or a prototype of
> compression
> > > > > feature?
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.
> com/
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message