From dev-return-32594-archive-asf-public=cust-asf.ponee.io@ignite.apache.org  Tue Mar 27 12:06:02 2018
Return-Path: <dev-return-32594-archive-asf-public=cust-asf.ponee.io@ignite.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 42FEA18064E
	for <archive-asf-public@cust-asf.ponee.io>; Tue, 27 Mar 2018 12:06:02 +0200 (CEST)
Received: (qmail 90500 invoked by uid 500); 27 Mar 2018 10:06:01 -0000
Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@ignite.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@ignite.apache.org>
List-Post: <mailto:dev@ignite.apache.org>
List-Id: <dev.ignite.apache.org>
Reply-To: dev@ignite.apache.org
Delivered-To: mailing list dev@ignite.apache.org
Received: (qmail 90489 invoked by uid 99); 27 Mar 2018 10:06:01 -0000
Received: from mail-relay.apache.org (HELO mailrelay1-lw-us.apache.org) (207.244.88.152)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Mar 2018 10:06:01 +0000
Received: from mail-qt0-f170.google.com (mail-qt0-f170.google.com [209.85.216.170])
	by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 1B578BD6
	for <dev@ignite.apache.org>; Tue, 27 Mar 2018 10:05:59 +0000 (UTC)
Received: by mail-qt0-f170.google.com with SMTP id j26so23156410qtl.11
        for <dev@ignite.apache.org>; Tue, 27 Mar 2018 03:05:59 -0700 (PDT)
X-Gm-Message-State: AElRT7ExKIHwHfHRkBjKH4nwNdlf27ENs8E0hyjrjj6CcAaMB3TpJkjt
	TSlQVca5mAnA+NrTsrfDVVOZAb/IyQSLwIejZ9o=
X-Google-Smtp-Source: AG47ELvZ9NzI4rpDVdeT8h5yYLXadw4rtf8rEr2rAkygX8MqYa52gSjJHN8D5cYz2HYYVMn2wVF8mecvmLH6CWLuEOg=
X-Received: by 10.200.50.61 with SMTP id x58mr61501915qta.130.1522145159116;
 Tue, 27 Mar 2018 03:05:59 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.12.229.201 with HTTP; Tue, 27 Mar 2018 03:05:58 -0700 (PDT)
In-Reply-To: <CA+0=VoUG=WovL8WDoJ5OK+reOtumijofH_FRfq5LvZORcq_3nA@mail.gmail.com>
References: <CAFyTW-jbUgrYt0YzGRzeng5XZp6oKAY6pxGeOvNZvLqxRrjDiA@mail.gmail.com>
 <4699fe34-5d4a-86c7-6e91-7e5bf409084e@gridgain.com> <CAGoWVG53ftsFNG=3W0khOJYx4fLmgT5OL-HLFfS1w-NpT1wAzw@mail.gmail.com>
 <CAGt6o8mgOoZKpi_P+wz=XcjkTtC+njM_MU5c9zm9P2RDyv2_KA@mail.gmail.com>
 <CABDss3h-js5rCyhPG7Qp2HntKGsDEGP0_qA9p+YZPMYK+Ahjmg@mail.gmail.com> <CA+0=VoUG=WovL8WDoJ5OK+reOtumijofH_FRfq5LvZORcq_3nA@mail.gmail.com>
From: Anton Vinogradov <av@apache.org>
Date: Tue, 27 Mar 2018 13:05:58 +0300
X-Gmail-Original-Message-ID: <CAGoWVG7siXwvu6ojPeD8fEcXPUC_SAqRvYmGNBJyiSAONMaeDw@mail.gmail.com>
Message-ID: <CAGoWVG7siXwvu6ojPeD8fEcXPUC_SAqRvYmGNBJyiSAONMaeDw@mail.gmail.com>
Subject: Re: Data compression design proposal
To: dev@ignite.apache.org
Content-Type: multipart/alternative; boundary="001a11404574817c4305686209bf"

--001a11404574817c4305686209bf
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

>> page compression at rebalancing is a good idea even is we have problems
with storing on disc.
BTW, do we have or going to have rebalancing based on pages streaming
instead of entries streaming?

2018-03-27 3:03 GMT+03:00 Dmitriy Setrakyan <dsetrakyan@apache.org>:

> AG,
>
> I would also ask about the compression itself. How and where do we store
> the compression meta information? We cannot be compressing every page
> separately, it will not be effective. However, if we try to store the
> compression metadata, how do we make other nodes aware of it? Has this be=
en
> discussed?
>
> D.
>
> On Mon, Mar 26, 2018 at 8:53 AM, Alexey Goncharuk <
> alexey.goncharuk@gmail.com> wrote:
>
> > Guys,
> >
> > How does this fit the PageMemory concept? Currently it assumes that the
> > size of the page in memory and the size of the page on disk is the same=
,
> so
> > only per-entry level compression within a page makes sense.
> >
> > If you compress a whole page, how do you calculate the page offset in t=
he
> > target data file?
> >
> > --AG
> >
> > 2018-03-26 17:39 GMT+03:00 Vladimir Ozerov <vozerov@gridgain.com>:
> >
> > > Gents,
> > >
> > > If I understood the idea correctly, the proposal is to compress pages
> on
> > > eviction and decompress them on read from disk. Is it correct?
> > >
> > > On Mon, Mar 26, 2018 at 5:13 PM, Anton Vinogradov <av@apache.org>
> wrote:
> > >
> > > > + 1 to Taras's vision.
> > > >
> > > > Compression on eviction is a good case to store more.
> > > > Pages at memory always hot a real system, so complession in memory
> will
> > > > definetely slowdown the system, I think.
> > > >
> > > > Anyway, we can split issue to "on eviction compression" and to
> > "in-memory
> > > > compression".
> > > >
> > > >
> > > > 2018-03-06 12:14 GMT+03:00 Taras Ledkov <tledkov@gridgain.com>:
> > > >
> > > > > Hi,
> > > > >
> > > > > I guess page level compression make sense on page loading /
> eviction.
> > > > > In this case we can decrease I/O operation and performance boost
> can
> > be
> > > > > reached.
> > > > > What is goal for in-memory compression? Holds about 2-5x data in
> > memory
> > > > > with performance drop?
> > > > >
> > > > > Also please clarify the case with compression/decompression for h=
ot
> > and
> > > > > cold pages.
> > > > > Is it right for your approach:
> > > > > 1. Hot pages are always decompressed in memory because many
> > read/write
> > > > > operations touch ones.
> > > > > 2. So we can compress only cold pages.
> > > > >
> > > > > So the way is suitable when the hot data size << available RAM
> size.
> > > > >
> > > > > Thoughts?
> > > > >
> > > > >
> > > > > On 05.03.2018 20:18, Vyacheslav Daradur wrote:
> > > > >
> > > > >> Hi Igniters!
> > > > >>
> > > > >> I=E2=80=99d like to do next step in our data compression discuss=
ion [1].
> > > > >>
> > > > >> Most Igniters vote for per-data-page compression.
> > > > >>
> > > > >> I=E2=80=99d like to accumulate  main theses to start implementat=
ion:
> > > > >> - page will be compressed with the dictionary-based approach
> > (e.g.LZV)
> > > > >> - page will be compressed in batch mode (not on every change)
> > > > >> - page compression should been initiated by an event, for
> example, a
> > > > >> page=E2=80=99s free space drops below 20%
> > > > >> - compression process will be under page write lock
> > > > >>
> > > > >> Vladimir Ozerov has written:
> > > > >>
> > > > >>> What we do not understand yet:
> > > > >>>> 1) Granularity of compression algorithm.
> > > > >>>> 1.1) It could be per-entry - i.e. we compress the whole entry
> > > content,
> > > > >>>> but
> > > > >>>> respect boundaries between entries. E.g.: before -
> > > [ENTRY_1][ENTRY_2],
> > > > >>>> after - [COMPRESSED_ENTRY_1][COMPRESSED_ENTRY_2] (as opposed t=
o
> > > > >>>> [COMPRESSED ENTRY_1 and ENTRY_2]).
> > > > >>>> v1.2) Or it could be per-field - i.e. we compress fields, but
> > > respect
> > > > >>>> binary
> > > > >>>> object layout. First approach is simple, straightforward, and
> will
> > > > give
> > > > >>>> acceptable compression rate, but we will have to compress the
> > whole
> > > > >>>> binary
> > > > >>>> object on every field access, what may ruin our SQL performanc=
e.
> > > > Second
> > > > >>>> approach is more complex, we are not sure about it's compressi=
on
> > > rate,
> > > > >>>> but
> > > > >>>> as BinaryObject structure is preserved, we will still have fas=
t
> > > > >>>> constant-time per-field access.
> > > > >>>>
> > > > >>> I think there are advantages in both approaches and we will be
> able
> > > to
> > > > >> compare different approaches and algorithms after prototype
> > > > >> implementation.
> > > > >>
> > > > >> Main approach in brief:
> > > > >> 1) When page=E2=80=99s free space drops below 20% will be trigge=
red
> > > compression
> > > > >> event
> > > > >> 2) Page will be locked by write lock
> > > > >> 3) Page will be passed to page=E2=80=99s compressor implementati=
on
> > > > >> 4) Page will be replaced by compressed page
> > > > >>
> > > > >> Whole object or a field reading:
> > > > >> 1) If page marked as compressed then the page will be handled by
> > > > >> page=E2=80=99s compressor implementation, otherwise, it will be =
handled as
> > > > >> usual.
> > > > >>
> > > > >> Thoughts?
> > > > >>
> > > > >> Should we create new IEP and register tickets to start
> > implementation?
> > > > >> This will allow us to watch for the feature progress and related
> > > > >> tasks.
> > > > >>
> > > > >>
> > > > >> [1] http://apache-ignite-developers.2346864.n4.nabble.com/Data-
> > > > >> compression-in-Ignite-tc20679.html
> > > > >>
> > > > >>
> > > > >>
> > > > > --
> > > > > Taras Ledkov
> > > > > Mail-To: tledkov@gridgain.com
> > > > >
> > > > >
> > > >
> > >
> >
>

--001a11404574817c4305686209bf--