Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
From: "Mich Talebzadeh" <mich@peridale.co.uk>
To: <user@hive.apache.org>
References: <015001d14227$2e852310$8b8f6930$@peridale.co.uk>
 <3EDBB066-514E-4C4C-90D1-E1802B05A7FB@gmail.com>
 <01aa01d14300$4f4afaf0$ede0f0d0$@peridale.co.uk>
 <2F5F6F5F-FA73-4D85-82BD-3A60281EBD9D@gmail.com>
 <01e101d14337$071e65f0$155b31d0$@peridale.co.uk>
 <CANXtaKB6+Uy4W3i2EDMdDtu29WGiXp8v4mm_XoifsrAk1Ndruw@mail.gmail.com>
 <025d01d143f9$da9683f0$8fc38bd0$@peridale.co.uk>
 <6990B863-FFCF-4D72-BE98-D7A525337E25@gmail.com>
In-Reply-To: <6990B863-FFCF-4D72-BE98-D7A525337E25@gmail.com>
Subject: RE: Running the same query on 1 billion rows fact table in Hive on
 Spark compared to Sybase IQ columnar database
Date: Thu, 31 Dec 2015 18:54:33 -0000
Message-ID: <026601d143fc$b00c22e0$102468a0$@peridale.co.uk>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_0267_01D143FC.B00E6CD0"
Thread-Index: 
 AQKc9qhsJZKxINn8+BYOGrjBQgOVvQJjrsKYAqtTgjUBS+vqHgGpWwNNAvaXKdwBw0FGAAGfzXmmnNsantA=
Content-Language: en-gb

This is a multipart message in MIME format.

------=_NextPart_000_0267_01D143FC.B00E6CD0
Content-Type: text/plain;
	charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

I agree but Spark 1.3.1 on Hive is the only one I have managed to make =
it work. Still it is twice as fast as Hive on MapReduce.

=20

Just to clarify my understanding is that the optimiser is provided by =
Hive and is the same for both executions engines. Is there anything =
specific that Spark 1.3.1 lacks compared to Spark 1.5.1 when executing =
the query?

=20

Thanks =20

=20

Mich Talebzadeh

=20

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-0919=
08.pdf

Author of the books "A Practitioner=E2=80=99s Guide to Upgrading to =
Sybase ASE 15", ISBN 978-0-9563693-0-7.=20

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN =
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: =
978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, =
volume one out shortly

=20

http://talebzadehmich.wordpress.com =
<http://talebzadehmich.wordpress.com/>=20

=20

NOTE: The information in this email is proprietary and confidential. =
This message is for the designated recipient only, if you are not the =
intended recipient, you should destroy it immediately. Any information =
in this message shall not be understood as given or endorsed by Peridale =
Technology Ltd, its subsidiaries or their employees, unless expressly so =
stated. It is the responsibility of the recipient to ensure that this =
email is virus free, therefore neither Peridale Ltd, its subsidiaries =
nor their employees accept any responsibility.

=20

From: J=C3=B6rn Franke [mailto:jornfranke@gmail.com]=20
Sent: 31 December 2015 18:44
To: user@hive.apache.org
Subject: Re: Running the same query on 1 billion rows fact table in Hive =
on Spark compared to Sybase IQ columnar database

=20

You are using an old version of Spark and it cannot leverage all =
optimizations of Hive, so I think that your conclusion cannot be as easy =
as you might think.=20


On 31 Dec 2015, at 19:34, Mich Talebzadeh <mich@peridale.co.uk =
<mailto:mich@peridale.co.uk> > wrote:

Ok guys.

=20

I have not succeeded in installing TEZ. Yet so I can try the query on =
TEZ as well.

=20

Just to remind that the query is used is pretty common. Get the total =
amount sold for each calendar month from sales (I billion rows) and =
times=20

=20

SELECT t.calendar_month_desc, SUM(s.amount_sold)

FROM sales s, times t WHERE s.time_id =3D t.time_id

GROUP BY t.calendar_month_desc;

=20

In total 48 rows are returned back

Now having thought about It, granted TEZ is going to be faster than MR =
as it is basically MR with DAG thrown at it. On the other Spark will =
have both DAG and in-memory calculation.=20

=20

=20

The results are as follow:

=20

=20

Optimiser             Engine               Timing               =
Compression           Total Table size     =20

Hive                 MapReduce             4673.035 seconds      Snappy  =
              totalSize=3D2678882153 =3D 2.5GB

Hive                 Spark 1.3.1           1578.817 seconds      Snappy

Columnar              Sybase IQ              30.000 seconds      Native  =
              5GB

=20

=20

It is pretty obvious that Spark outperforms MapReduce more than twice =
even taking into account the number of rows on the FACT table and =
frankly I would not have thought that TEZ is going to beat Spark (to be =
seen). Having said that Hive storage is twice more efficient but I am =
not sure what one can do to improve the performance. Table in Hive is =
stored as ORC table and it has crossed my mind that maybe we should =
think about storing every column of an ORC table as an index. That may =
improve the performance further.

=20

HTH

=20

=20

Mich Talebzadeh

=20

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-0919=
08.pdf

Author of the books "A Practitioner=E2=80=99s Guide to Upgrading to =
Sybase ASE 15", ISBN 978-0-9563693-0-7.=20

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN =
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: =
978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, =
volume one out shortly

=20

http://talebzadehmich.wordpress.com =
<http://talebzadehmich.wordpress.com/>=20

=20

NOTE: The information in this email is proprietary and confidential. =
This message is for the designated recipient only, if you are not the =
intended recipient, you should destroy it immediately. Any information =
in this message shall not be understood as given or endorsed by Peridale =
Technology Ltd, its subsidiaries or their employees, unless expressly so =
stated. It is the responsibility of the recipient to ensure that this =
email is virus free, therefore neither Peridale Ltd, its subsidiaries =
nor their employees accept any responsibility.

=20

From: Marcin Tustin [mailto:mtustin@handybook.com]=20
Sent: 30 December 2015 19:27
To: user@hive.apache.org <mailto:user@hive.apache.org>=20
Subject: Re: Running the same query on 1 billion rows fact table in Hive =
on Spark compared to Sybase IQ columnar database

=20

I'm using TEZ 0.7.0.2.3 with hive 1.2.1.2.3. I can confirm that TEZ is =
much faster than MR in pretty much all cases. Also, with hive, you'll =
make sure you've performed optimizations like aligning ORC stripe sizes =
with HDFS block sizes, and concatenated your tables (not so much an =
optimization as a must for avoiding the small files problem).

=20

On Wed, Dec 30, 2015 at 2:19 PM, Mich Talebzadeh <mich@peridale.co.uk =
<mailto:mich@peridale.co.uk> > wrote:

Thanks again Jorn.

=20

=20

Both Hive and Sybase IQ are running on the same host. Yes for Sybase IQ =
I have compression enabled. The FACT table in IQ (sales) has LF (read =
bitmap) indexes on the time_id column. For the dimension table (times) I =
have time_id defined as primary key. Also Sybase IQ creates FP (fast =
projection) indexes on every column by default.

=20

Anyway I am trying to download and build TEZ. Do we know which version =
of TEZ works with Hive 1.2.1 please? 0.8 seems to be in alpha

=20

Thanks

=20

Mich Talebzadeh

=20

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-0919=
08.pdf

Author of the books "A Practitioner=E2=80=99s Guide to Upgrading to =
Sybase ASE 15", ISBN 978-0-9563693-0-7.=20

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN =
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: =
978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, =
volume one out shortly

=20

http://talebzadehmich.wordpress.com =
<http://talebzadehmich.wordpress.com/>=20

=20

NOTE: The information in this email is proprietary and confidential. =
This message is for the designated recipient only, if you are not the =
intended recipient, you should destroy it immediately. Any information =
in this message shall not be understood as given or endorsed by Peridale =
Technology Ltd, its subsidiaries or their employees, unless expressly so =
stated. It is the responsibility of the recipient to ensure that this =
email is virus free, therefore neither Peridale Ltd, its subsidiaries =
nor their employees accept any responsibility.

=20


------=_NextPart_000_0267_01D143FC.B00E6CD0
Content-Type: text/html;
	charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" =
xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" =
xmlns=3D"http://www.w3.org/TR/REC-html40"><head><meta =
http-equiv=3DContent-Type content=3D"text/html; charset=3Dutf-8"><meta =
name=3DGenerator content=3D"Microsoft Word 15 (filtered =
medium)"><style><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:"Helvetica Neue";}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0cm;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
p
	{mso-style-priority:99;
	mso-margin-top-alt:auto;
	margin-right:0cm;
	mso-margin-bottom-alt:auto;
	margin-left:0cm;
	font-size:12.0pt;
	font-family:"Times New Roman",serif;}
p.msonormal0, li.msonormal0, div.msonormal0
	{mso-style-name:msonormal;
	mso-style-priority:99;
	mso-margin-top-alt:auto;
	margin-right:0cm;
	mso-margin-bottom-alt:auto;
	margin-left:0cm;
	font-size:12.0pt;
	font-family:"Times New Roman",serif;}
span.EmailStyle19
	{mso-style-type:personal;
	font-family:"Arial",sans-serif;
	color:windowtext;
	font-weight:normal;
	font-style:normal;
	text-decoration:none none;}
span.EmailStyle20
	{mso-style-type:personal-reply;
	font-family:"Arial",sans-serif;
	color:windowtext;
	font-weight:normal;
	font-style:normal;
	text-decoration:none none;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-size:10.0pt;}
@page WordSection1
	{size:612.0pt 792.0pt;
	margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]--></head><body lang=3DEN-GB link=3Dblue =
vlink=3Dpurple><div class=3DWordSection1><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>I agree but Spark 1.3.1 on Hive is the only one I have =
managed to make it work. Still it is twice as fast as Hive on =
MapReduce.<o:p></o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>Just to clarify my understanding is that the optimiser is =
provided by Hive and is the same for both executions engines. Is there =
anything specific that Spark 1.3.1 lacks compared to Spark 1.5.1 when =
executing the query?<o:p></o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>Thanks =C2=A0<o:p></o:p></span></p><p =
class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'><o:p>&nbsp;</o:p></span></p><div><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif'>Mich =
Talebzadeh<o:p></o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif'><o:p>&nbsp;</o:=
p></span></p><p class=3DMsoNormal><i><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif'>Sybase ASE 15 =
Gold Medal Award 2008<o:p></o:p></span></i></p><p class=3DMsoNormal =
style=3D'text-autospace:none'><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif;color:#C0504D'>A=
 Winning Strategy: Running the most Critical Financial Data on ASE =
15<o:p></o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:9.0pt;font-family:"Arial",sans-serif'><a =
href=3D"http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strat=
egy-091908.pdf">http://login.sybase.com/files/Product_Overviews/ASE-Winni=
ng-Strategy-091908.pdf</a><o:p></o:p></span></p><p class=3DMsoNormal =
style=3D'text-autospace:none'><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif;color:blue'>Auth=
or of the books<b> &quot;A Practitioner=E2=80=99s Guide to Upgrading to =
Sybase ASE 15&quot;, ISBN 978-0-9563693-0-7</b>. =
<o:p></o:p></span></p><p class=3DMsoNormal =
style=3D'text-autospace:none'><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif;color:blue'>co-a=
uthor <b>&quot;Sybase Transact SQL Guidelines Best Practices&quot;, ISBN =
978-0-9759693-0-4</b><o:p></o:p></span></p><p class=3DMsoNormal =
style=3D'text-align:justify'><u><span =
style=3D'font-size:11.0pt;font-family:"Calibri",sans-serif;letter-spacing=
:-.15pt'>Publications due shortly:</span></u><u><span =
style=3D'font-size:11.0pt;font-family:"Calibri",sans-serif;letter-spacing=
:-.15pt'><o:p></o:p></span></u></p><p class=3DMsoNormal =
style=3D'text-align:justify'><b><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif;color:black'>Com=
plex Event Processing in Heterogeneous Environments</span></b><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif;color:black'>, =
ISBN: 978-0-9563693-3-8<o:p></o:p></span></p><p class=3DMsoNormal =
style=3D'text-align:justify'><b><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif'>Oracle and =
Sybase, Concepts and Contrasts</span></b><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif'>, ISBN: =
978-0-9563693-1-4, <span style=3D'color:black'>volume one out =
shortly<o:p></o:p></span></span></p><p class=3DMsoNormal =
style=3D'text-align:justify'><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif;color:black'><o:=
p>&nbsp;</o:p></span></p><p class=3DMsoNormal =
style=3D'text-align:justify'><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif'><a =
href=3D"http://talebzadehmich.wordpress.com/">http://talebzadehmich.wordp=
ress.com</a><o:p></o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:9.0pt;font-family:"Arial",sans-serif'><o:p>&nbsp;</o:p=
></span></p><p class=3DMsoNormal><span =
style=3D'font-size:7.5pt;font-family:"Arial",sans-serif;color:black'>NOTE=
: The information in this email is proprietary and confidential. This =
message is for the designated recipient only, if you are not the =
intended recipient, you should destroy it immediately. Any information =
in this message shall not be understood as given or endorsed by Peridale =
Technology Ltd, its subsidiaries or their employees, unless expressly so =
stated. It is the responsibility of the recipient to ensure that this =
email is virus free, therefore neither Peridale Ltd, its subsidiaries =
nor their employees accept any responsibility.</span><span =
style=3D'font-size:11.0pt;font-family:"Calibri",sans-serif;color:black'><=
o:p></o:p></span></p></div><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'><o:p>&nbsp;</o:p></span></p><div><div =
style=3D'border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm =
0cm 0cm'><p class=3DMsoNormal><b><span lang=3DEN-US =
style=3D'font-size:11.0pt;font-family:"Calibri",sans-serif'>From:</span><=
/b><span lang=3DEN-US =
style=3D'font-size:11.0pt;font-family:"Calibri",sans-serif'> J=C3=B6rn =
Franke [mailto:jornfranke@gmail.com] <br><b>Sent:</b> 31 December 2015 =
18:44<br><b>To:</b> user@hive.apache.org<br><b>Subject:</b> Re: Running =
the same query on 1 billion rows fact table in Hive on Spark compared to =
Sybase IQ columnar database<o:p></o:p></span></p></div></div><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p><div><p class=3DMsoNormal>You are =
using an old version of Spark and it cannot leverage all optimizations =
of Hive, so I think that your conclusion cannot be as easy as you might =
think.&nbsp;<o:p></o:p></p></div><div><p class=3DMsoNormal =
style=3D'margin-bottom:12.0pt'><br>On 31 Dec 2015, at 19:34, Mich =
Talebzadeh &lt;<a =
href=3D"mailto:mich@peridale.co.uk">mich@peridale.co.uk</a>&gt; =
wrote:<o:p></o:p></p></div><blockquote =
style=3D'margin-top:5.0pt;margin-bottom:5.0pt'><div><p =
class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>Ok guys.</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>&nbsp;</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>I have not succeeded in installing TEZ. Yet so I can try the =
query on TEZ as well.</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>&nbsp;</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>Just to remind that the query is used is pretty common. Get =
the total amount sold for each calendar month from sales (I billion =
rows) and times </span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>&nbsp;</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>SELECT t.calendar_month_desc, =
SUM(s.amount_sold)</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Courier =
New";color:blue;mso-fareast-language:EN-US'>FROM sales s, times t WHERE =
s.time_id =3D t.time_id</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Courier =
New";color:blue;mso-fareast-language:EN-US'>GROUP BY =
t.calendar_month_desc;</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>&nbsp;</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>In total 48 rows are returned back</span><o:p></o:p></p><p =
class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>Now having thought about It, granted TEZ is going to be =
faster than MR as it is basically MR with DAG thrown at it. On the other =
Spark will have both DAG and in-memory calculation. =
</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>&nbsp;</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>&nbsp;</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>The results are as follow:</span><o:p></o:p></p><p =
class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>&nbsp;</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>&nbsp;</span><o:p></o:p></p><p class=3DMsoNormal><b><span =
style=3D'font-size:11.0pt;font-family:"Courier =
New";color:black;mso-fareast-language:EN-US'>Optimiser&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
Engine&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp; =
Timing&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp; =
Compression&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
Total Table size&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
</span></b><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Courier =
New";color:blue;mso-fareast-language:EN-US'>Hive&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
MapReduce&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp; 4673.035 seconds&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
Snappy&nbsp;&nbsp;&nbsp;&nbsp; =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
totalSize=3D2678882153 =3D 2.5GB</span><o:p></o:p></p><p =
class=3DMsoNormal><span style=3D'font-size:11.0pt;font-family:"Courier =
New";color:blue;mso-fareast-language:EN-US'>Hive&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
Spark 1.3.1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
1578.817 seconds &nbsp;&nbsp;&nbsp;&nbsp; Snappy</span><o:p></o:p></p><p =
class=3DMsoNormal><span style=3D'font-size:11.0pt;font-family:"Courier =
New";color:blue;mso-fareast-language:EN-US'>Columnar&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Sybase =
IQ&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
; &nbsp;30.000 seconds&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
Native&nbsp;&nbsp;&nbsp;&nbsp; =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
5GB</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Courier =
New";color:blue;mso-fareast-language:EN-US'>&nbsp;</span><o:p></o:p></p><=
p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>&nbsp;</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>It is pretty obvious that Spark outperforms MapReduce more =
than twice even taking into account the number of rows on the FACT table =
and frankly I would not have thought that TEZ is going to beat Spark (to =
be seen). Having said that Hive storage is twice more efficient but I am =
not sure what one can do to improve the performance. Table in Hive is =
stored as ORC table and it has crossed my mind that maybe we should =
think about storing every column of an ORC table as an index. That may =
improve the performance further.</span><o:p></o:p></p><p =
class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>&nbsp;</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>HTH</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>&nbsp;</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>&nbsp;</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif'>Mich =
Talebzadeh</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif'>&nbsp;</span><o=
:p></o:p></p><p class=3DMsoNormal><i><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif'>Sybase ASE 15 =
Gold Medal Award 2008</span></i><o:p></o:p></p><p class=3DMsoNormal =
style=3D'text-autospace:none'><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif;color:#C0504D'>A=
 Winning Strategy: Running the most Critical Financial Data on ASE =
15</span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:9.0pt;font-family:"Arial",sans-serif'><a =
href=3D"http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strat=
egy-091908.pdf">http://login.sybase.com/files/Product_Overviews/ASE-Winni=
ng-Strategy-091908.pdf</a></span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'text-autospace:none'><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif;color:blue'>Auth=
or of the books<b> &quot;A Practitioner=E2=80=99s Guide to Upgrading to =
Sybase ASE 15&quot;, ISBN 978-0-9563693-0-7</b>. =
</span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'text-autospace:none'><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif;color:blue'>co-a=
uthor <b>&quot;Sybase Transact SQL Guidelines Best Practices&quot;, ISBN =
978-0-9759693-0-4</b></span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'text-align:justify'><u><span =
style=3D'font-size:11.0pt;font-family:"Calibri",sans-serif;letter-spacing=
:-.15pt'>Publications due shortly:</span></u><o:p></o:p></p><p =
class=3DMsoNormal style=3D'text-align:justify'><b><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif;color:black'>Com=
plex Event Processing in Heterogeneous Environments</span></b><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif;color:black'>, =
ISBN: 978-0-9563693-3-8</span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'text-align:justify'><b><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif'>Oracle and =
Sybase, Concepts and Contrasts</span></b><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif'>, ISBN: =
978-0-9563693-1-4, <span style=3D'color:black'>volume one out =
shortly</span></span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'text-align:justify'><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif;color:black'>&nb=
sp;</span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'text-align:justify'><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif'><a =
href=3D"http://talebzadehmich.wordpress.com/">http://talebzadehmich.wordp=
ress.com</a></span><o:p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:9.0pt;font-family:"Arial",sans-serif'>&nbsp;</span><o:=
p></o:p></p><p class=3DMsoNormal><span =
style=3D'font-size:7.5pt;font-family:"Arial",sans-serif;color:black'>NOTE=
: The information in this email is proprietary and confidential. This =
message is for the designated recipient only, if you are not the =
intended recipient, you should destroy it immediately. Any information =
in this message shall not be understood as given or endorsed by Peridale =
Technology Ltd, its subsidiaries or their employees, unless expressly so =
stated. It is the responsibility of the recipient to ensure that this =
email is virus free, therefore neither Peridale Ltd, its subsidiaries =
nor their employees accept any responsibility.</span><o:p></o:p></p><p =
class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Arial",sans-serif;mso-fareast-lang=
uage:EN-US'>&nbsp;</span><o:p></o:p></p><p class=3DMsoNormal><b><span =
lang=3DEN-US =
style=3D'font-size:11.0pt;font-family:"Calibri",sans-serif'>From:</span><=
/b><span lang=3DEN-US =
style=3D'font-size:11.0pt;font-family:"Calibri",sans-serif'> Marcin =
Tustin [<a =
href=3D"mailto:mtustin@handybook.com">mailto:mtustin@handybook.com</a>] =
<br><b>Sent:</b> 30 December 2015 19:27<br><b>To:</b> <a =
href=3D"mailto:user@hive.apache.org">user@hive.apache.org</a><br><b>Subje=
ct:</b> Re: Running the same query on 1 billion rows fact table in Hive =
on Spark compared to Sybase IQ columnar database</span><o:p></o:p></p><p =
class=3DMsoNormal>&nbsp;<o:p></o:p></p><div><p class=3DMsoNormal>I'm =
using TEZ&nbsp;<span style=3D'font-size:10.5pt;font-family:"Helvetica =
Neue";color:#333333'>0.7.0.2.3 with hive 1.2.1.2.3. I can confirm that =
TEZ is much faster than MR in pretty much all cases. Also, with hive, =
you'll make sure you've performed optimizations like aligning ORC stripe =
sizes with HDFS block sizes, and concatenated your tables (not so much =
an optimization as a must for avoiding the small files =
problem).</span><o:p></o:p></p></div><div><p =
class=3DMsoNormal>&nbsp;<o:p></o:p></p><div><p class=3DMsoNormal>On Wed, =
Dec 30, 2015 at 2:19 PM, Mich Talebzadeh &lt;<a =
href=3D"mailto:mich@peridale.co.uk" =
target=3D"_blank">mich@peridale.co.uk</a>&gt; =
wrote:<o:p></o:p></p><blockquote style=3D'border:none;border-left:solid =
#CCCCCC 1.0pt;padding:0cm 0cm 0cm =
6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0cm;margin-bottom:5=
.0pt'><div><div><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-family:"Arial",sans-serif'>Thanks again =
Jorn.</span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-family:"Arial",sans-serif'>&nbsp;</span><o:p></o:p></p><p =
class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-family:"Arial",sans-serif'>&nbsp;</span><o:p></o:p></p><p =
class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-family:"Arial",sans-serif'>Both Hive and Sybase IQ are =
running on the same host. Yes for Sybase IQ I have compression enabled. =
The FACT table in IQ (sales) has LF (read bitmap) indexes on the time_id =
column. For the dimension table (times) I have time_id defined as =
primary key. Also Sybase IQ creates FP (fast projection) indexes on =
every column by default.</span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-family:"Arial",sans-serif'>&nbsp;</span><o:p></o:p></p><p =
class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-family:"Arial",sans-serif'>Anyway I am trying to download =
and build TEZ. Do we know which version of TEZ works with Hive 1.2.1 =
please? 0.8 seems to be in alpha</span><o:p></o:p></p><p =
class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-family:"Arial",sans-serif'>&nbsp;</span><o:p></o:p></p><p =
class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-family:"Arial",sans-serif'>Thanks</span><o:p></o:p></p><p =
class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-family:"Arial",sans-serif'>&nbsp;</span><o:p></o:p></p><div=
><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-family:"Arial",sans-serif'>Mich =
Talebzadeh</span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-family:"Arial",sans-serif'>&nbsp;</span><o:p></o:p></p><p =
class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><i><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif'>Sybase ASE 15 =
Gold Medal Award 2008</span></i><o:p></o:p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;text-autospac=
e:none'><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif;color:#C0504D'>A=
 Winning Strategy: Running the most Critical Financial Data on ASE =
15</span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-size:9.0pt;font-family:"Arial",sans-serif'><a =
href=3D"http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strat=
egy-091908.pdf" =
target=3D"_blank">http://login.sybase.com/files/Product_Overviews/ASE-Win=
ning-Strategy-091908.pdf</a></span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;text-autospac=
e:none'><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif;color:blue'>Auth=
or of the books<b> &quot;A Practitioner=E2=80=99s Guide to Upgrading to =
Sybase ASE 15&quot;, ISBN 978-0-9563693-0-7</b>. =
</span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;text-autospac=
e:none'><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif;color:blue'>co-a=
uthor <b>&quot;Sybase Transact SQL Guidelines Best Practices&quot;, ISBN =
978-0-9759693-0-4</b></span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;text-align:ju=
stify'><u><span style=3D'letter-spacing:-.15pt'>Publications due =
shortly:</span></u><o:p></o:p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;text-align:ju=
stify'><b><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif;color:black'>Com=
plex Event Processing in Heterogeneous Environments</span></b><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif;color:black'>, =
ISBN: 978-0-9563693-3-8</span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;text-align:ju=
stify'><b><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif'>Oracle and =
Sybase, Concepts and Contrasts</span></b><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif'>, ISBN: =
978-0-9563693-1-4, <span style=3D'color:black'>volume one out =
shortly</span></span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;text-align:ju=
stify'><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif;color:black'>&nb=
sp;</span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;text-align:ju=
stify'><span =
style=3D'font-size:10.0pt;font-family:"Arial",sans-serif'><a =
href=3D"http://talebzadehmich.wordpress.com/" =
target=3D"_blank">http://talebzadehmich.wordpress.com</a></span><o:p></o:=
p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-size:9.0pt;font-family:"Arial",sans-serif'>&nbsp;</span><o:=
p></o:p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-size:7.5pt;font-family:"Arial",sans-serif;color:black'>NOTE=
: The information in this email is proprietary and confidential. This =
message is for the designated recipient only, if you are not the =
intended recipient, you should destroy it immediately. Any information =
in this message shall not be understood as given or endorsed by Peridale =
Technology Ltd, its subsidiaries or their employees, unless expressly so =
stated. It is the responsibility of the recipient to ensure that this =
email is virus free, therefore neither Peridale Ltd, its subsidiaries =
nor their employees accept any =
responsibility.</span><o:p></o:p></p></div><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-family:"Arial",sans-serif'>&nbsp;</span><o:p></o:p></p></di=
v></div></blockquote></div></div></div></blockquote></div></body></html>
------=_NextPart_000_0267_01D143FC.B00E6CD0--