From user-return-208-archive-asf-public=cust-asf.ponee.io@orc.apache.org  Tue Mar 27 01:23:16 2018
Return-Path: <user-return-208-archive-asf-public=cust-asf.ponee.io@orc.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 37B16180649
	for <archive-asf-public@cust-asf.ponee.io>; Tue, 27 Mar 2018 01:23:16 +0200 (CEST)
Received: (qmail 67853 invoked by uid 500); 26 Mar 2018 23:23:15 -0000
Mailing-List: contact user-help@orc.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@orc.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@orc.apache.org>
List-Post: <mailto:user@orc.apache.org>
List-Id: <user.orc.apache.org>
Reply-To: user@orc.apache.org
Delivered-To: mailing list user@orc.apache.org
Received: (qmail 67833 invoked by uid 99); 26 Mar 2018 23:23:14 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Mar 2018 23:23:14 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 316C51A0804;
	Mon, 26 Mar 2018 23:23:14 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 1.879
X-Spam-Level: *
X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01,
	RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled
Authentication-Results: spamd2-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024)
	with ESMTP id jt8CXpst9s9E; Mon, 26 Mar 2018 23:23:13 +0000 (UTC)
Received: from mail-ot0-f174.google.com (mail-ot0-f174.google.com [74.125.82.174])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id AB7395F1A1;
	Mon, 26 Mar 2018 23:23:12 +0000 (UTC)
Received: by mail-ot0-f174.google.com with SMTP id v64-v6so1409624otb.13;
        Mon, 26 Mar 2018 16:23:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :cc;
        bh=BjOVuwQ/Y9ImpEskstp7aoMu1Ry+IH2ddA2DFge/5xo=;
        b=JYDwxe32R+c/wEyw8p+eCM8QsOTBgUD+NN/IbWz/Gmbl2VZ3mEXrYNadYdUNPjiVFg
         +yAtmKcb6zWTuk3P1fH++DvnTgYoUmS8cD8fLF/Os8Tn5rCtUuSy+a2fSEMiCJJjfdgE
         86LjLUfOGaf9vlJoxrqkWEaxocz0bZ/5TBdtjF7AZSjWhm2gHWOWtorgPBO0DJoxymET
         ebp7VCTGrjdIv9TABPaguEb/Ytwpd6/qQuVuHPw45nK2cJyLglvcx0rgne95zLTdxt/A
         CNsyeLDMnGzMVMDEeb64prS+pvnmEKW7BlY7nd70fPPN7ijZU2zwKsm06WrucxuFiVSj
         13+A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:in-reply-to:references:from:date
         :message-id:subject:to:cc;
        bh=BjOVuwQ/Y9ImpEskstp7aoMu1Ry+IH2ddA2DFge/5xo=;
        b=HmWRY5wDlD6Qjb95ztfzrLSLs0LEIzgb63NGoCVFUpF6wPPRyTqyyVwBURFhdavMT9
         IaKY/8KqsvWJbiV50GbWZPJ1SS2f4sjX9vzO6CuxbogoeMgdu9J3iteH2YWY8abGxlV8
         FTH5VBJ+04tSHiKQRSeokuyhlTTQWJ237lKfqaOpmxjyHsE5cN0f32l0LFre5Dc++nQj
         zznc4UbhbavYTkhocOaxxUQoFnWWNc7EqUWJCT8FnHFirTlgPWYzxDjaqyPSl1cfEDCE
         vmbprlmvLkhIdkjrmapeI1Dhh23hJTlgRZgyWvEP9DjuJ9YQ1QEMT52hlTWS0Q7B6a9C
         QVmg==
X-Gm-Message-State: AElRT7GqIqdg7y/43RraZ9qQfc0hIvC2Aw/7WhIFqWa2TWH9OEuFGPTh
	KtpRho+xK9wLn1PFVCyDhyye5lQkY6mvsPz6cQUmaA==
X-Google-Smtp-Source: AG47ELvzW0uyq6LXC3s2TyQB7nkvBR9eFAUOMa4OZGfljfrgGfhyHC6Xu/3AmCA0RTT9aHt4NOB6Nhqm8+DQmDqBlYU=
X-Received: by 2002:a9d:cee:: with SMTP id o43-v6mr23381301otd.293.1522106591771;
 Mon, 26 Mar 2018 16:23:11 -0700 (PDT)
MIME-Version: 1.0
Received: by 2002:a9d:155b:0:0:0:0:0 with HTTP; Mon, 26 Mar 2018 16:23:10
 -0700 (PDT)
In-Reply-To: <8E7B3A58-2FFD-4336-A6FA-B79C3E3E851D@hortonworks.com>
References: <17B91B6B0D9BBC44A1682DABC201C53552055763@SHSMSX104.ccr.corp.intel.com>
 <D220CF55-A229-4A61-AFD3-A799E3997E90@hortonworks.com> <CY1PR05MB24282204DAEF8BEFBFCB1C068DAD0@CY1PR05MB2428.namprd05.prod.outlook.com>
 <17B91B6B0D9BBC44A1682DABC201C5355205EF6D@SHSMSX104.ccr.corp.intel.com> <8E7B3A58-2FFD-4336-A6FA-B79C3E3E851D@hortonworks.com>
From: "Owen O'Malley" <owen.omalley@gmail.com>
Date: Mon, 26 Mar 2018 16:23:10 -0700
Message-ID: <CAHfHakHgjbRr8QoPRpNcJE9UTvmTimVtPdgv4VSO4Hv8Le_rWA@mail.gmail.com>
Subject: Re: ORC double encoding optimization proposal
To: dev@orc.apache.org
Cc: "user@orc.apache.org" <user@orc.apache.org>
Content-Type: multipart/alternative; boundary="000000000000b681960568590e1a"

--000000000000b681960568590e1a
Content-Type: text/plain; charset="UTF-8"

This is a really interesting conversation. Of course, the original use case
for ORC was that you were never reading less than a stripe. So putting all
of the data streams for a column back to back, which isn't in the spec, but
should be, was optimal in terms of seeks.

There are two cases that violate this assumption:
* you are using predicate push down and thus only need to read a few row
groups.
* you are extending the reader to interleave the compression and io.

So a couple of layouts come to mind:

* Finish the compression chunks at the row group (10k rows) and interleave
the streams for the column for each row group.
  This would help with both predicate pushdown and the async io reader.
  We would lose some compression by closing the compression chunks early
and have additional overhead to track the sizes for the row group.
  On the plus side we could simplify the indexes because the compression
chunks would always align with with row groups.

* Divide each 256k (larger?) with the proportional part of each stream.
Thus if the column has 3 streams and they were 50%, 30%, and 20% we would
take
  that much data from each 256k. This wouldn't reduce the compression or
require any additional metadata, since the reader could determine the
number of
  bytes of each stream per a "page". This wouldn't help very much for PPD,
but would help for the async io reader.

So which use case maters the most? What other layouts would be interesting?

.. Owen

On Mon, Mar 26, 2018 at 12:33 PM, Gopal Vijayaraghavan <gopalv@apache.org>
wrote:

>
> > the bad thing is that we still have TWO encodings to discuss.
>
> Two is exactly what we need, not five - from the existing ORC configs
>
> hive.exec.orc.encoding.strategy=[SPEED, COMPRESSION];
>
> FLIP8 was my original suggestion to Teddy from the byteuniq UDF runs,
> though the regressions in compression over the PlainV2 is still bothering
> me (which is why I went digging into the Zlib dictionary builder impl with
> infgen).
>
> All comparisons below are for Size & against PlainV2
>
> For Zlib, this is pretty bad for FLIP.
>
> ZLIB:HIGGS Regressing on FLIP by 6 points
> ZLIB:DISCOUNT_AMT Regressing on FLIP by 10 points
> ZLIB:IOT_METER Regressing on FLIP by 32 points
> ZLIB:LIST_PRICE Regressing on FLIP by 36 points
> ZLIB:PHONE Regressing on FLIP by 50 points
>
> SPLIT has no size regressions.
>
> With ZSTD SPLIT has a couple of regressions in size
>
> ZSTD:DISCOUNT_AMT Regressing on FLIP by 5 points
> ZSTD:IOT_METER Regressing on FLIP by 17 points
> ZSTD:HIGGS Regressing on FLIP by 18 points
> ZSTD:LIST_PRICE Regressing on FLIP by 30 points
> ZSTD:PHONE Regressing on FLIP by 55 points
>
> ZSTD:HIGGS Regressing on SPLIT by 10 points
> ZSTD:PHONE Regressing on SPLIT by 3 points
>
> but FLIP still has more size regressions & big ones there.
>
> I'm continuing to mess with both algorithms, but I have wider problems to
> fix in FLIP & at a lower algorithm level than in SPLIT.
>
> Cheers,
> Gopal
>
>
>

--000000000000b681960568590e1a
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">This is a really interesting conversation. Of course, the =
original use case for ORC was that you were never reading less than a strip=
e. So putting all of the data streams for a column back to back, which isn&=
#39;t in the spec, but should be, was optimal in terms of seeks.=C2=A0<div>=
<br></div><div>There are two cases that violate this assumption:</div><div>=
* you are using predicate push down and thus only need to read a few row gr=
oups.</div><div>* you are extending the reader to interleave the compressio=
n and io.<br><div><br></div><div>So a couple of layouts come to mind:</div>=
<div><br></div><div>* Finish the compression chunks at the row group (10k r=
ows) and interleave the streams for the column for each row group.</div><di=
v>=C2=A0 This would help with both predicate pushdown and the async io read=
er.</div><div>=C2=A0 We would lose some compression by closing the compress=
ion chunks early and have additional overhead to track the sizes for the ro=
w group.</div><div>=C2=A0 On the plus side we could simplify the indexes be=
cause the compression chunks would always align with with row groups.</div>=
<div><br></div><div>* Divide each 256k (larger?) with the proportional part=
 of each stream. Thus if the column has 3 streams and they were 50%, 30%, a=
nd 20% we would take</div><div>=C2=A0 that much data from each 256k. This w=
ouldn&#39;t reduce the compression or require any additional metadata, sinc=
e the reader could determine the number of</div><div>=C2=A0 bytes of each s=
tream per a &quot;page&quot;. This wouldn&#39;t help very much for PPD, but=
 would help for the async io reader.</div></div><div><br></div><div>So whic=
h use case maters the most? What other layouts would be interesting?</div><=
div><br></div><div>.. Owen</div></div><div class=3D"gmail_extra"><br><div c=
lass=3D"gmail_quote">On Mon, Mar 26, 2018 at 12:33 PM, Gopal Vijayaraghavan=
 <span dir=3D"ltr">&lt;<a href=3D"mailto:gopalv@apache.org" target=3D"_blan=
k">gopalv@apache.org</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_qu=
ote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex=
"><span class=3D""><br>
&gt; the bad thing is that we still have TWO encodings to discuss.<br>
<br>
</span>Two is exactly what we need, not five - from the existing ORC config=
s<br>
<br>
hive.exec.orc.encoding.<wbr>strategy=3D[SPEED, COMPRESSION];<br>
<br>
FLIP8 was my original suggestion to Teddy from the byteuniq UDF runs, thoug=
h the regressions in compression over the PlainV2 is still bothering me (wh=
ich is why I went digging into the Zlib dictionary builder impl with infgen=
).<br>
<br>
All comparisons below are for Size &amp; against PlainV2<br>
<br>
For Zlib, this is pretty bad for FLIP.<br>
<br>
ZLIB:HIGGS Regressing on FLIP by 6 points<br>
ZLIB:DISCOUNT_AMT Regressing on FLIP by 10 points<br>
ZLIB:IOT_METER Regressing on FLIP by 32 points<br>
ZLIB:LIST_PRICE Regressing on FLIP by 36 points<br>
ZLIB:PHONE Regressing on FLIP by 50 points<br>
<br>
SPLIT has no size regressions.<br>
<br>
With ZSTD SPLIT has a couple of regressions in size<br>
<br>
ZSTD:DISCOUNT_AMT Regressing on FLIP by 5 points<br>
ZSTD:IOT_METER Regressing on FLIP by 17 points<br>
ZSTD:HIGGS Regressing on FLIP by 18 points<br>
ZSTD:LIST_PRICE Regressing on FLIP by 30 points<br>
ZSTD:PHONE Regressing on FLIP by 55 points<br>
<br>
ZSTD:HIGGS Regressing on SPLIT by 10 points<br>
ZSTD:PHONE Regressing on SPLIT by 3 points<br>
<br>
but FLIP still has more size regressions &amp; big ones there.<br>
<br>
I&#39;m continuing to mess with both algorithms, but I have wider problems =
to fix in FLIP &amp; at a lower algorithm level than in SPLIT.<br>
<br>
Cheers,<br>
Gopal<br>
<br>
<br>
</blockquote></div><br></div>

--000000000000b681960568590e1a--