ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Kuznetsov <akuznet...@gridgain.com>
Subject Re: Data compression in Ignite 2.0
Date Wed, 27 Jul 2016 06:24:12 GMT
Nikita,

That was my intention: "we may need to provide a better facility to inject
user's logic here..."

Andrey,
About compression, once again - DB2 is a row-based DB and they can compress
:)

On Wed, Jul 27, 2016 at 12:56 PM, Nikita Ivanov <nivanov30@gmail.com> wrote:

> Very good points indeed. I get the compression in Ignite question quite
> often and Hana reference is a typical lead in.
>
> My personal opinion is still that in Ignite *specifically* the compression
> is best left to the end-user. But we may need to provide a better facility
> to inject user's logic here...
>
> --
> Nikita Ivanov
>
>
> On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <andrewkornev@hotmail.com>
> wrote:
>
> > Dictionary compression requires some knowledge about data being
> > compressed. For example, for numeric types a range of values must be
> known
> > so that the dictionary can be generated. For strings, the number of
> unique
> > values of the column is the key piece of input into the dictionary
> > generation.
> > SAP HANA is a column-based database system: it stores the fields of the
> > data tuple individually using the best compression for the given data
> type
> > and the particular set of values. HANA has been specifically built as a
> > general purpose database, rather than as an afterthought layer on top of
> an
> > already existing distributed cache.
> > On the other hand, Ignite is a distributed cache implementation (a pretty
> > good one!) that in general requires no schema and stores its data in the
> > row-based fashion. Its current design doesn't land itself readily to the
> > kind of optimizations HANA provides out of the box.
> > For the curios types among us, the implementation details of HANA are
> well
> > documented in "In-memory Data Management", by Hasso Plattner & Alexander
> > Zeier.
> > Cheers
> > Andrey
> > _____________________________
> > From: Alexey Kuznetsov <akuznetsov@gridgain.com<mailto:
> > akuznetsov@gridgain.com>>
> > Sent: Tuesday, July 26, 2016 5:36 AM
> > Subject: Re: Data compression in Ignite 2.0
> > To: <dev@ignite.apache.org<mailto:dev@ignite.apache.org>>
> >
> >
> > Sergey Kozlov wrote:
> > >> For approach 1: Put a large object into a partition cache will
> > force to update
> > the dictionary placed on replication cache. It may be time-expense
> > operation.
> > The dictionary will be built only once. And we could control what should
> be
> > put into dictionary, for example, we could check min and max size and
> > decide - put value to dictionary or not.
> >
> > >> Approach 2-3 are make sense for rare cases as Sergi commented.
> > But it is better at least have a possibility to plug user code for
> > compression than not to have it at all.
> >
> > >> Also I see a danger of OOM if we've got high compression level and try
> > to restore original value in memory.
> > We could easily get OOM with many other operations right now without
> > compression, I think it is not an issue, we could add a NOTE to
> > documentation about such possibility.
> >
> > Andrey Kornev wrote:
> > >> ... in general I think compression is a great data. The cleanest way
> to
> > achieve that would be to just make it possible to chain the
> marshallers...
> > I think it is also good idea. And looks like it could be used for
> > compression with some sort of ZIP algorithm, but how to deal with
> > compression by dictionary substitution?
> > We need to build dictionary first. Any ideas?
> >
> > Nikita Ivanov wrote:
> > >> SAP Hana does the compression by 1) compressing SQL parameters before
> > execution...
> > Looks interesting, but my initial point was about compression of cache
> > data, not SQL queries.
> > My idea was to make compression transparent for SQL engine when it will
> > lookup for data.
> >
> > But idea of compressing SQL queries result looks very interesting,
> because
> > it is known fact, that SQL engine could consume quite a lot of heap for
> > storing result sets.
> > I think this should be discussed in separate thread.
> >
> > Just for you information, in first message I mentioned that DB2 has
> > compression by dictionary and according to them it is possible to
> > compress usual data to 50-80%.
> > I have some experience with DB2 and can confirm this.
> >
> > --
> > Alexey Kuznetsov
>


-- 
Alexey Kuznetsov

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message