Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAJ3fcbBPLTv8C=dpYejvLAh2VnmjdMuGd_aWJ7R49JiHdn2ZvQ@mail.gmail.com>
References: 
 <CAJ3fcbA4ev=Sz_U7aMwGZfP+_wV3Tk3quX3YM8reZRaKHV3YTw@mail.gmail.com>
	<1892683924.2956880.1456868332338.JavaMail.yahoo@mail.yahoo.com>
	<CAENxBwxPuxwNWfxB6R10xMZP0-qnRYqE-qdr__XYb9z_h+Au=A@mail.gmail.com>
	<316A6CEC-B530-460B-97BF-43F3BF3A738A@gmail.com>
	<CAJ3fcbBPLTv8C=dpYejvLAh2VnmjdMuGd_aWJ7R49JiHdn2ZvQ@mail.gmail.com>
Date: Wed, 2 Mar 2016 15:35:35 +0000
Message-ID: 
 <CAJ3fcbAb0vJvYhwyrvRtdGQFWieDBaB+sZpZFv_P0OfuBN5jmw@mail.gmail.com>
Subject: Re: Hive and Impala
From: Mich Talebzadeh <mich.talebzadeh@gmail.com>
To: user@hive.apache.org
Cc: Ashok Kumar <ashok34668@yahoo.com>
Content-Type: multipart/alternative; boundary=001a11440dfa1a62c3052d12a2c3

--001a11440dfa1a62c3052d12a2c3
Content-Type: text/plain; charset=UTF-8

I forgot besides LLAP you are going to have Hive Hybrid Procedural SQL On
Hadoop <http://Hive Hybrid Procedural SQL On Hadoop (HPL/SQL)>(HPL/SQL) which
is going to add another dimension to Hive

Dr Mich Talebzadeh


LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*


http://talebzadehmich.wordpress.com


On 2 March 2016 at 15:30, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:

> SQL plays an increasing important role on Hadoop. As of today Hive IMO
> provides the best and most robust solution to anything resembling to Data
> Warehouse "solution" on Hadoop, chiefly by means of its powerful metastore
> which can be hosted on a variety of mission critical databases plus Hive's
> ever increasing support for a variety of file types on HDFs from humble
> textfile to ORC. The remaining tools are little more than query tools that
> crucially rely on Hive Metastore for their needs. Take away Hive component
> and they are more and less lame ducks.
>
> Hive on MR speed was perceived to be slow but what the hec we are talking
> about a Data Warehouse here which in most part should be batch oriented
> and not user-facing and batch oriented. In Hive 0.14 and 2.0 you can use
> Spark and Tez as the execution engine and if you are well into functional
> programming, you can deploy Spark on Hive. If you look around from Impala
> to Spark the architecture is essentially a query tool.
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 2 March 2016 at 13:52, Dayong <willddy@gmail.com> wrote:
>
>> As I remember of few weeks before in Hadoop weekly news feed, cloudera
>> has a benchmark showing implala is a little better than spark SQL and hive
>> with tez. You can check that. From my experience, hive is still leading
>> tool for regular ETL job since it is stable. The other tool are better for
>> adhoc and interactive query use case. Cloudera bet on implala especially
>> with its new kudo project.
>>
>> Thanks,
>> Dayong
>>
>> On Mar 1, 2016, at 5:14 PM, Edward Capriolo <edlinuxguru@gmail.com>
>> wrote:
>>
>> My nocks on impala. (not intended to be a post knocking impala)
>>
>> Impala really has not delivered on the complex types that hive has (after
>> promising it for quite a while), also it only works with the 'blessed'
>> input formats, parquet, avro, text.
>>
>> It is very annoying to work with impala, In my version if you create a
>> partition in hive impala does not see it. You have to run "refresh".
>>
>> In impala I do not have all the UDFS that hive has like percentile, etc.
>>
>> Impala is fast. Many data-analysts / data-scientist types that can't wait
>> 10 seconds for a query so when I need top produce something for them I make
>> sure the data has no complex types and uses a table type that impala
>> understands.
>>
>> But for my work I still work primarily in hive, because I do not want to
>> deal with all the things that impala does not have/might have/ and when I
>> need something special like my own UDFs it is easier to whip up the
>> solution in hive.
>>
>> Having worked with M$ SQL server, and vertica, Impala is on par with them
>> but I don'think of it like i think of hive. To me it just feels like a
>> vertica that I can cheat loading sometimes because it is backed by hdfs.
>>
>> Hive is something different, I am making pipelines, I am transforming
>> data, doing streaming, writing custom udfs, querying JSON directly. Its not
>> != impala.
>>
>> ::random message of the day::
>>
>>
>>
>>
>> On Tue, Mar 1, 2016 at 4:38 PM, Ashok Kumar <ashok34668@yahoo.com> wrote:
>>
>>>
>>> Dr Mitch,
>>>
>>> My two cents here.
>>>
>>> I don't have direct experience of Impala but in my humble opinion I
>>> share your views that Hive provides the best metastore of all Big Data
>>> systems. Looking around almost every product in one form and shape use Hive
>>> code somewhere. My colleagues inform me that Hive is one of the most stable
>>> Big Data products.
>>>
>>> With the capabilities of Spark on Hive and Hive on Spark or Tez plus of
>>> course MR, there is really little need for many other products in the same
>>> space. It is good to keep things simple.
>>>
>>> Warmest
>>>
>>>
>>> On Tuesday, 1 March 2016, 11:33, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>
>>> I have not heard of Impala anymore. I saw an article in LinkedIn titled
>>>
>>> "Apache Hive Or Cloudera Impala? What is Best for me?"
>>>
>>> "We can access all objects from Hive data warehouse with HiveQL which
>>> leverages the map-reduce architecture in background for data retrieval and
>>> transformation and this results in latency."
>>>
>>> My response was
>>>
>>> This statement is no longer valid as you have choices of three engines
>>> now with MR, Spark and Tez. I have not used Impala myself as I don't think
>>> there is a need for it with Hive on Spark or Spark using Hive metastore
>>> providing whatever needed. Hive is for Data Warehouse and provides what is
>>> says on the tin. Please also bear in mind that Hive offers ORC storage
>>> files that provide store Index capabilities further optimizing the queries
>>> with additional stats at file, stripe and row group levels.
>>>
>>> Anyway the question is with Hive on Spark or Spark using Hive metastore
>>> what we cannot achieve that we can achieve with Impala?
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>>
>>
>

--001a11440dfa1a62c3052d12a2c3
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I forgot besides LLAP you are going to have<a href=3D"http=
://Hive Hybrid Procedural SQL On Hadoop (HPL/SQL)"> Hive Hybrid Procedural =
SQL On Hadoop </a>(HPL/SQL)=C2=A0which is going to add another dimension to=
 Hive=C2=A0</div><div class=3D"gmail_extra"><br clear=3D"all"><div><div cla=
ss=3D"gmail_signature"><div dir=3D"ltr"><font color=3D"#000000" face=3D"Tim=
es New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Cali=
bri" size=3D"3">Dr Mich Talebzadeh</font></p><font color=3D"#000000" face=
=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Cali=
bri" size=3D"3">=C2=A0</font></p><font color=3D"#000000" face=3D"Times New =
Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><span style=3D"font-family:&quot;Ari=
al&quot;,sans-serif"><font color=3D"#000000" size=3D"3">LinkedIn </font></s=
pan><i><span style=3D"font-family:&quot;Arial&quot;,sans-serif;font-size:10=
pt"><font color=3D"#000000">=C2=A0</font><a href=3D"https://www.linkedin.co=
m/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw" target=3D"_bla=
nk"><font color=3D"#0000ff">https://www.linkedin.com/profile/view?id=3DAAEA=
AAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw</font></a></span></i></p><font color=3D=
"#000000" face=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Cali=
bri" size=3D"3">=C2=A0</font></p><font color=3D"#000000" face=3D"Times New =
Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt;text-align:justify"><span style=3D"fo=
nt-family:&quot;Arial&quot;,sans-serif;font-size:10pt"><a href=3D"http://ta=
lebzadehmich.wordpress.com/" target=3D"_blank"><font color=3D"#0000ff">http=
://talebzadehmich.wordpress.com</font></a></span></p><font color=3D"#000000=
" face=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><span style=3D"font-family:&quot;Ari=
al&quot;,sans-serif;font-size:9pt"><font color=3D"#000000">=C2=A0</font></s=
pan></p><font color=3D"#000000" face=3D"Times New Roman" size=3D"3">

</font></div></div></div>
<br><div class=3D"gmail_quote">On 2 March 2016 at 15:30, Mich Talebzadeh <s=
pan dir=3D"ltr">&lt;<a href=3D"mailto:mich.talebzadeh@gmail.com" target=3D"=
_blank">mich.talebzadeh@gmail.com</a>&gt;</span> wrote:<br><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex"><div dir=3D"ltr"><div>SQL plays an increasing important role=
 on Hadoop. As of today Hive IMO provides the best and most robust solution=
 to anything resembling to Data Warehouse &quot;solution&quot;=C2=A0on Hado=
op, chiefly by means of its powerful metastore which can be hosted on a var=
iety of mission critical databases plus Hive&#39;s ever increasing support=
=C2=A0for a variety of file types on HDFs=C2=A0from humble textfile to ORC.=
 The remaining tools are little more than query tools that crucially rely o=
n Hive Metastore for their needs. Take away Hive component and they are mor=
e and less lame ducks.</div><div><br></div><div>Hive on MR speed was percei=
ved to be slow but what the hec we are talking about a Data Warehouse here =
which in most part should be batch oriented=C2=A0 and not=C2=A0user-facing =
and batch oriented. In Hive 0.14 and 2.0 you can use Spark and Tez as the e=
xecution engine and if you are well into functional programming, you can de=
ploy Spark on Hive. If you look around from Impala to Spark the architectur=
e is essentially a query tool.</div><div><br></div><div><br></div></div><di=
v class=3D"gmail_extra"><span><br clear=3D"all"><div><div><div dir=3D"ltr">=
<font color=3D"#000000" face=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Cali=
bri" size=3D"3">Dr Mich Talebzadeh</font></p><font color=3D"#000000" face=
=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Cali=
bri" size=3D"3">=C2=A0</font></p><font color=3D"#000000" face=3D"Times New =
Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><span style=3D"font-family:&quot;Ari=
al&quot;,sans-serif"><font color=3D"#000000" size=3D"3">LinkedIn </font></s=
pan><i><span style=3D"font-family:&quot;Arial&quot;,sans-serif;font-size:10=
pt"><font color=3D"#000000">=C2=A0</font><a href=3D"https://www.linkedin.co=
m/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw" target=3D"_bla=
nk"><font color=3D"#0000ff">https://www.linkedin.com/profile/view?id=3DAAEA=
AAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw</font></a></span></i></p><font color=3D=
"#000000" face=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Cali=
bri" size=3D"3">=C2=A0</font></p><font color=3D"#000000" face=3D"Times New =
Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt;text-align:justify"><span style=3D"fo=
nt-family:&quot;Arial&quot;,sans-serif;font-size:10pt"><a href=3D"http://ta=
lebzadehmich.wordpress.com/" target=3D"_blank"><font color=3D"#0000ff">http=
://talebzadehmich.wordpress.com</font></a></span></p><font color=3D"#000000=
" face=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><span style=3D"font-family:&quot;Ari=
al&quot;,sans-serif;font-size:9pt"><font color=3D"#000000">=C2=A0</font></s=
pan></p><font color=3D"#000000" face=3D"Times New Roman" size=3D"3">

</font></div></div></div>
<br></span><div><div class=3D"h5"><div class=3D"gmail_quote">On 2 March 201=
6 at 13:52, Dayong <span dir=3D"ltr">&lt;<a href=3D"mailto:willddy@gmail.co=
m" target=3D"_blank">willddy@gmail.com</a>&gt;</span> wrote:<br><blockquote=
 class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;padding-left:1ex;b=
order-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:s=
olid"><div dir=3D"auto"><div>As I remember of few weeks before in Hadoop we=
ekly news feed, cloudera has a benchmark showing implala is a little better=
 than spark SQL and hive with tez. You can check that. From my experience, =
hive is still leading tool for regular ETL job since it is stable. The othe=
r tool are better for adhoc and interactive query use case. Cloudera bet on=
 implala especially with its new kudo project.=C2=A0<br><br>Thanks,<div>Day=
ong</div></div><div><div><div><br>On Mar 1, 2016, at 5:14 PM, Edward Caprio=
lo &lt;<a href=3D"mailto:edlinuxguru@gmail.com" target=3D"_blank">edlinuxgu=
ru@gmail.com</a>&gt; wrote:<br><br></div><blockquote type=3D"cite"><div><di=
v dir=3D"ltr">My nocks on impala. (not intended to be a post knocking impal=
a)<div><br></div><div>Impala really has not delivered on the complex types =
that hive has (after promising it for quite a while), also it only works wi=
th the &#39;blessed&#39; input formats, parquet, avro, text.<br><br>It is v=
ery annoying to work with impala, In my version if you create a partition i=
n hive impala does not see it. You have to run &quot;refresh&quot;.=C2=A0</=
div><div><br></div><div>In impala I do not have all the UDFS that hive has =
like percentile, etc.=C2=A0<br><br>Impala is fast. Many data-analysts / dat=
a-scientist types that can&#39;t wait 10 seconds for a query so when I need=
 top produce something for them I make sure the data has no complex types a=
nd uses a table type that impala understands.=C2=A0</div><div><br></div><di=
v>But for my work I still work primarily in hive, because I do not want to =
deal with all the things that impala does not have/might have/ and when I n=
eed something special like my own UDFs it is easier to whip up the solution=
 in hive.=C2=A0<br><br>Having worked with M$ SQL server, and vertica, Impal=
a is on par with them but I don&#39;think of it like i think of hive. To me=
 it just feels like a vertica that I can cheat loading sometimes because it=
 is backed by hdfs.=C2=A0</div><div><br></div><div>Hive is something differ=
ent, I am making pipelines, I am transforming data, doing streaming, writin=
g custom udfs, querying JSON directly. Its not !=3D impala.<br><br>::random=
 message of the day::</div><div><br><br>=C2=A0</div></div><div class=3D"gma=
il_extra"><br><div class=3D"gmail_quote">On Tue, Mar 1, 2016 at 4:38 PM, As=
hok Kumar <span dir=3D"ltr">&lt;<a href=3D"mailto:ashok34668@yahoo.com" tar=
get=3D"_blank">ashok34668@yahoo.com</a>&gt;</span> wrote:<br><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;padding-left:1ex;bord=
er-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:soli=
d"><div><div style=3D"color:rgb(0,0,0);font-family:times new roman,new york=
,times,serif;font-size:16px;background-color:rgb(255,255,255)"><div><br></d=
iv><div>Dr Mitch,</div><div><br></div><div dir=3D"ltr">My two cents here.</=
div><div dir=3D"ltr"><br></div><div dir=3D"ltr">I don&#39;t have direct exp=
erience of Impala but in my humble opinion I share your views that Hive pro=
vides the best metastore of all Big Data systems. Looking around almost eve=
ry product in one form and shape use Hive code somewhere. My colleagues inf=
orm me that Hive is one of the most stable Big Data products.</div><div dir=
=3D"ltr"><br></div><div dir=3D"ltr">With the capabilities of Spark on Hive =
and Hive on Spark or Tez plus of course MR, there is really little need for=
 many other products in the same space. It is good to keep things simple.</=
div><div><br></div><span><div>Warmest</div><div></div></span> <div><br><br>=
</div><div style=3D"display:block"> <div style=3D"font-family:times new rom=
an,new york,times,serif;font-size:16px"> <div style=3D"font-family:Helvetic=
aNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,Sans-Serif;font-size:16p=
x"> <div dir=3D"ltr"><font face=3D"Arial" size=3D"2"> On Tuesday, 1 March 2=
016, 11:33, Mich Talebzadeh &lt;<a href=3D"mailto:mich.talebzadeh@gmail.com=
" target=3D"_blank">mich.talebzadeh@gmail.com</a>&gt; wrote:<br></font></di=
v>  <br><br> <div><div><div dir=3D"ltr"><div>I have not heard of Impala any=
more. I saw an article in LinkedIn titled</div><div><br></div><div>&quot;Ap=
ache Hive Or Cloudera Impala? What is Best for me?&quot;</div><div><br></di=
v><div>&quot;We can access all objects from Hive data warehouse with HiveQL=
 which leverages the map-reduce architecture in background for data retriev=
al and transformation and this results in latency.&quot; </div><div><br></d=
iv><div>My response was</div><div><br></div><div>This statement is no longe=
r valid as you have choices of three engines now with MR, Spark and Tez. I =
have not used Impala myself as I don&#39;t think there is a need for it wit=
h Hive on Spark or Spark using Hive metastore providing whatever needed. Hi=
ve is for Data Warehouse and provides what is says on the tin. Please also =
bear in mind that Hive offers ORC storage files that provide store Index ca=
pabilities further optimizing the queries with additional stats at file, st=
ripe and row group levels.=C2=A0</div><div><br></div><div>Anyway the questi=
on is with Hive on Spark or Spark using Hive metastore what we cannot achie=
ve that we can achieve with Impala?</div><div><br></div><div><br clear=3D"a=
ll"></div><div><div><div dir=3D"ltr"><font color=3D"#000000" face=3D"Times =
New Roman" size=3D"3">

</font><div style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Ca=
libri" size=3D"3">Dr Mich Talebzadeh</font></div><font color=3D"#000000" fa=
ce=3D"Times New Roman" size=3D"3">

</font><div style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Ca=
libri" size=3D"3">=C2=A0</font></div><font color=3D"#000000" face=3D"Times =
New Roman" size=3D"3">

</font><div style=3D"margin:0cm 0cm 0pt"><span><font color=3D"#000000" size=
=3D"3">LinkedIn </font></span><i><span style=3D"font-size:10pt"><font color=
=3D"#000000">=C2=A0</font><a href=3D"https://www.linkedin.com/profile/view?=
id=3DAAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw" target=3D"_blank" rel=3D"nofo=
llow"><font color=3D"#0000ff">https://www.linkedin.com/profile/view?id=3DAA=
EAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw</font></a></span></i></div><font colo=
r=3D"#000000" face=3D"Times New Roman" size=3D"3">

</font><div style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Ca=
libri" size=3D"3">=C2=A0</font></div><font color=3D"#000000" face=3D"Times =
New Roman" size=3D"3">

</font><div style=3D"margin:0cm 0cm 0pt;text-align:justify"><span style=3D"=
font-size:10pt"><a href=3D"http://talebzadehmich.wordpress.com/" target=3D"=
_blank" rel=3D"nofollow"><font color=3D"#0000ff">http://talebzadehmich.word=
press.com</font></a></span></div><font color=3D"#000000" face=3D"Times New =
Roman" size=3D"3">

</font><div style=3D"margin:0cm 0cm 0pt"><span style=3D"font-size:9pt"><fon=
t color=3D"#000000">=C2=A0</font></span></div><font color=3D"#000000" face=
=3D"Times New Roman" size=3D"3">

</font></div></div></div>
</div></div><br><br></div>  </div> </div>  </div></div></div></blockquote><=
/div><br></div>
</div></blockquote></div></div></div></blockquote></div><br></div></div></d=
iv>
</blockquote></div><br></div>

--001a11440dfa1a62c3052d12a2c3--