Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
MIME-Version: 1.0
In-Reply-To: <A7C80F65-7CDD-4628-84D5-C7037CF061E2@gmail.com>
References: 
 <CAJ3fcbA4ev=Sz_U7aMwGZfP+_wV3Tk3quX3YM8reZRaKHV3YTw@mail.gmail.com>
	<1892683924.2956880.1456868332338.JavaMail.yahoo@mail.yahoo.com>
	<CAENxBwxPuxwNWfxB6R10xMZP0-qnRYqE-qdr__XYb9z_h+Au=A@mail.gmail.com>
	<316A6CEC-B530-460B-97BF-43F3BF3A738A@gmail.com>
	<CAJ3fcbBPLTv8C=dpYejvLAh2VnmjdMuGd_aWJ7R49JiHdn2ZvQ@mail.gmail.com>
	<CAJ3fcbAb0vJvYhwyrvRtdGQFWieDBaB+sZpZFv_P0OfuBN5jmw@mail.gmail.com>
	<A7C80F65-7CDD-4628-84D5-C7037CF061E2@gmail.com>
Date: Wed, 2 Mar 2016 18:26:00 +0000
Message-ID: 
 <CAJ3fcbAD+563NkvDfzyo+=aG8dC3XCz_D8Y7P_HNycuT=zP9cg@mail.gmail.com>
Subject: Re: Hive and Impala
From: Mich Talebzadeh <mich.talebzadeh@gmail.com>
To: user@hive.apache.org
Cc: Ashok Kumar <ashok34668@yahoo.com>
Content-Type: multipart/alternative; boundary=001a11440dfa916740052d15030e

--001a11440dfa916740052d15030e
Content-Type: text/plain; charset=UTF-8

OK two questions here please:


   1. Which version of Hive are you running
   2. Have you tried Hive on Spark which does both DAG & In-memory
   calculation.

Query Hive on Spark job[1] stages:
INFO  : 2
INFO  : 3


HTH


Dr Mich Talebzadeh


LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*


http://talebzadehmich.wordpress.com


On 2 March 2016 at 18:14, Dayong <willddy@gmail.com> wrote:

> Tez is kind of outdated and Orc is so dedicated on hive. In addition, hive
> metadata store can be decoupled from hive as well. In reality, we do suffer
> from hive's performance even for ETL job. As result, we'll switch to
> implala + spark/ flink.
>
> Thanks,
> Dayong
>
> On Mar 2, 2016, at 10:35 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com>
> wrote:
>
> I forgot besides LLAP you are going to have Hive Hybrid Procedural SQL On
> Hadoop (HPL/SQL) which is going to add another dimension to Hive
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 2 March 2016 at 15:30, Mich Talebzadeh <mich.talebzadeh@gmail.com>
> wrote:
>
>> SQL plays an increasing important role on Hadoop. As of today Hive IMO
>> provides the best and most robust solution to anything resembling to Data
>> Warehouse "solution" on Hadoop, chiefly by means of its powerful metastore
>> which can be hosted on a variety of mission critical databases plus Hive's
>> ever increasing support for a variety of file types on HDFs from humble
>> textfile to ORC. The remaining tools are little more than query tools that
>> crucially rely on Hive Metastore for their needs. Take away Hive component
>> and they are more and less lame ducks.
>>
>> Hive on MR speed was perceived to be slow but what the hec we are talking
>> about a Data Warehouse here which in most part should be batch oriented
>> and not user-facing and batch oriented. In Hive 0.14 and 2.0 you can use
>> Spark and Tez as the execution engine and if you are well into functional
>> programming, you can deploy Spark on Hive. If you look around from Impala
>> to Spark the architecture is essentially a query tool.
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 2 March 2016 at 13:52, Dayong <willddy@gmail.com> wrote:
>>
>>> As I remember of few weeks before in Hadoop weekly news feed, cloudera
>>> has a benchmark showing implala is a little better than spark SQL and hive
>>> with tez. You can check that. From my experience, hive is still leading
>>> tool for regular ETL job since it is stable. The other tool are better for
>>> adhoc and interactive query use case. Cloudera bet on implala especially
>>> with its new kudo project.
>>>
>>> Thanks,
>>> Dayong
>>>
>>> On Mar 1, 2016, at 5:14 PM, Edward Capriolo <edlinuxguru@gmail.com>
>>> wrote:
>>>
>>> My nocks on impala. (not intended to be a post knocking impala)
>>>
>>> Impala really has not delivered on the complex types that hive has
>>> (after promising it for quite a while), also it only works with the
>>> 'blessed' input formats, parquet, avro, text.
>>>
>>> It is very annoying to work with impala, In my version if you create a
>>> partition in hive impala does not see it. You have to run "refresh".
>>>
>>> In impala I do not have all the UDFS that hive has like percentile, etc.
>>>
>>> Impala is fast. Many data-analysts / data-scientist types that can't
>>> wait 10 seconds for a query so when I need top produce something for them I
>>> make sure the data has no complex types and uses a table type that impala
>>> understands.
>>>
>>> But for my work I still work primarily in hive, because I do not want to
>>> deal with all the things that impala does not have/might have/ and when I
>>> need something special like my own UDFs it is easier to whip up the
>>> solution in hive.
>>>
>>> Having worked with M$ SQL server, and vertica, Impala is on par with
>>> them but I don'think of it like i think of hive. To me it just feels like a
>>> vertica that I can cheat loading sometimes because it is backed by hdfs.
>>>
>>> Hive is something different, I am making pipelines, I am transforming
>>> data, doing streaming, writing custom udfs, querying JSON directly. Its not
>>> != impala.
>>>
>>> ::random message of the day::
>>>
>>>
>>>
>>>
>>> On Tue, Mar 1, 2016 at 4:38 PM, Ashok Kumar <ashok34668@yahoo.com>
>>> wrote:
>>>
>>>>
>>>> Dr Mitch,
>>>>
>>>> My two cents here.
>>>>
>>>> I don't have direct experience of Impala but in my humble opinion I
>>>> share your views that Hive provides the best metastore of all Big Data
>>>> systems. Looking around almost every product in one form and shape use Hive
>>>> code somewhere. My colleagues inform me that Hive is one of the most stable
>>>> Big Data products.
>>>>
>>>> With the capabilities of Spark on Hive and Hive on Spark or Tez plus of
>>>> course MR, there is really little need for many other products in the same
>>>> space. It is good to keep things simple.
>>>>
>>>> Warmest
>>>>
>>>>
>>>> On Tuesday, 1 March 2016, 11:33, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>
>>>> I have not heard of Impala anymore. I saw an article in LinkedIn titled
>>>>
>>>> "Apache Hive Or Cloudera Impala? What is Best for me?"
>>>>
>>>> "We can access all objects from Hive data warehouse with HiveQL which
>>>> leverages the map-reduce architecture in background for data retrieval and
>>>> transformation and this results in latency."
>>>>
>>>> My response was
>>>>
>>>> This statement is no longer valid as you have choices of three engines
>>>> now with MR, Spark and Tez. I have not used Impala myself as I don't think
>>>> there is a need for it with Hive on Spark or Spark using Hive metastore
>>>> providing whatever needed. Hive is for Data Warehouse and provides what is
>>>> says on the tin. Please also bear in mind that Hive offers ORC storage
>>>> files that provide store Index capabilities further optimizing the queries
>>>> with additional stats at file, stripe and row group levels.
>>>>
>>>> Anyway the question is with Hive on Spark or Spark using Hive metastore
>>>> what we cannot achieve that we can achieve with Impala?
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

--001a11440dfa916740052d15030e
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>OK two questions here please:</div><div><div>=C2=A0</=
div></div><ol><li>Which version of Hive are you running</li><li>Have you tr=
ied Hive on Spark which does both DAG &amp; In-memory calculation.</li></ol=
><blockquote style=3D"margin-right:0px" dir=3D"ltr"><div><font color=3D"#00=
00ff" face=3D"monospace,monospace">Query Hive on Spark job[1] stages:</font=
></div><div><font color=3D"#0000ff" face=3D"monospace,monospace">INFO=C2=A0=
 : 2</font></div><div><font color=3D"#0000ff" face=3D"monospace,monospace">=
INFO=C2=A0 : 3</font></div><div><font color=3D"#0000ff" face=3D"monospace,m=
onospace">=C2=A0</font></div></blockquote><div>=C2=A0</div><div>HTH</div><d=
iv><div>=C2=A0</div></div></div><div class=3D"gmail_extra"><br clear=3D"all=
"><div><div class=3D"gmail_signature"><div dir=3D"ltr"><font color=3D"#0000=
00" face=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Cali=
bri" size=3D"3">Dr Mich Talebzadeh</font></p><font color=3D"#000000" face=
=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Cali=
bri" size=3D"3">=C2=A0</font></p><font color=3D"#000000" face=3D"Times New =
Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><span style=3D"font-family:&quot;Ari=
al&quot;,sans-serif"><font color=3D"#000000" size=3D"3">LinkedIn </font></s=
pan><i><span style=3D"font-family:&quot;Arial&quot;,sans-serif;font-size:10=
pt"><font color=3D"#000000">=C2=A0</font><a href=3D"https://www.linkedin.co=
m/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw" target=3D"_bla=
nk"><font color=3D"#0000ff">https://www.linkedin.com/profile/view?id=3DAAEA=
AAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw</font></a></span></i></p><font color=3D=
"#000000" face=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Cali=
bri" size=3D"3">=C2=A0</font></p><font color=3D"#000000" face=3D"Times New =
Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt;text-align:justify"><span style=3D"fo=
nt-family:&quot;Arial&quot;,sans-serif;font-size:10pt"><a href=3D"http://ta=
lebzadehmich.wordpress.com/" target=3D"_blank"><font color=3D"#0000ff">http=
://talebzadehmich.wordpress.com</font></a></span></p><font color=3D"#000000=
" face=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><span style=3D"font-family:&quot;Ari=
al&quot;,sans-serif;font-size:9pt"><font color=3D"#000000">=C2=A0</font></s=
pan></p><font color=3D"#000000" face=3D"Times New Roman" size=3D"3">

</font></div></div></div>
<br><div class=3D"gmail_quote">On 2 March 2016 at 18:14, Dayong <span dir=
=3D"ltr">&lt;<a href=3D"mailto:willddy@gmail.com" target=3D"_blank">willddy=
@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=
=3D"auto"><div>Tez is kind of outdated and Orc is so dedicated on hive. In =
addition, hive metadata store can be decoupled from hive as well. In realit=
y, we do suffer from hive&#39;s performance even for ETL job. As result, we=
&#39;ll switch to implala + spark/ flink.=C2=A0<br><br>Thanks,<div>Dayong</=
div></div><div><div class=3D"h5"><div><br>On Mar 2, 2016, at 10:35 AM, Mich=
 Talebzadeh &lt;<a href=3D"mailto:mich.talebzadeh@gmail.com" target=3D"_bla=
nk">mich.talebzadeh@gmail.com</a>&gt; wrote:<br><br></div><blockquote type=
=3D"cite"><div><div dir=3D"ltr">I forgot besides LLAP you are going to have=
<a> Hive Hybrid Procedural SQL On Hadoop </a>(HPL/SQL)=C2=A0which is going =
to add another dimension to Hive=C2=A0</div><div class=3D"gmail_extra"><br =
clear=3D"all"><div><div><div dir=3D"ltr"><font color=3D"#000000" face=3D"Ti=
mes New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Cali=
bri" size=3D"3">Dr Mich Talebzadeh</font></p><font color=3D"#000000" face=
=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Cali=
bri" size=3D"3">=C2=A0</font></p><font color=3D"#000000" face=3D"Times New =
Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><span style=3D"font-family:&quot;Ari=
al&quot;,sans-serif"><font color=3D"#000000" size=3D"3">LinkedIn </font></s=
pan><i><span style=3D"font-family:&quot;Arial&quot;,sans-serif;font-size:10=
pt"><font color=3D"#000000">=C2=A0</font><a href=3D"https://www.linkedin.co=
m/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw" target=3D"_bla=
nk"><font color=3D"#0000ff">https://www.linkedin.com/profile/view?id=3DAAEA=
AAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw</font></a></span></i></p><font color=3D=
"#000000" face=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Cali=
bri" size=3D"3">=C2=A0</font></p><font color=3D"#000000" face=3D"Times New =
Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt;text-align:justify"><span style=3D"fo=
nt-family:&quot;Arial&quot;,sans-serif;font-size:10pt"><a href=3D"http://ta=
lebzadehmich.wordpress.com/" target=3D"_blank"><font color=3D"#0000ff">http=
://talebzadehmich.wordpress.com</font></a></span></p><font color=3D"#000000=
" face=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><span style=3D"font-family:&quot;Ari=
al&quot;,sans-serif;font-size:9pt"><font color=3D"#000000">=C2=A0</font></s=
pan></p><font color=3D"#000000" face=3D"Times New Roman" size=3D"3">

</font></div></div></div>
<br><div class=3D"gmail_quote">On 2 March 2016 at 15:30, Mich Talebzadeh <s=
pan dir=3D"ltr">&lt;<a href=3D"mailto:mich.talebzadeh@gmail.com" target=3D"=
_blank">mich.talebzadeh@gmail.com</a>&gt;</span> wrote:<br><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;padding-left:1ex;border=
-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid"=
><div dir=3D"ltr"><div>SQL plays an increasing important role on Hadoop. As=
 of today Hive IMO provides the best and most robust solution to anything r=
esembling to Data Warehouse &quot;solution&quot;=C2=A0on Hadoop, chiefly by=
 means of its powerful metastore which can be hosted on a variety of missio=
n critical databases plus Hive&#39;s ever increasing support=C2=A0for a var=
iety of file types on HDFs=C2=A0from humble textfile to ORC. The remaining =
tools are little more than query tools that crucially rely on Hive Metastor=
e for their needs. Take away Hive component and they are more and less lame=
 ducks.</div><div><br></div><div>Hive on MR speed was perceived to be slow =
but what the hec we are talking about a Data Warehouse here which in most p=
art should be batch oriented=C2=A0 and not=C2=A0user-facing and batch orien=
ted. In Hive 0.14 and 2.0 you can use Spark and Tez as the execution engine=
 and if you are well into functional programming, you can deploy Spark on H=
ive. If you look around from Impala to Spark the architecture is essentiall=
y a query tool.</div><div><br></div><div><br></div></div><div class=3D"gmai=
l_extra"><span><br clear=3D"all"><div><div><div dir=3D"ltr"><font color=3D"=
#000000" face=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Cali=
bri" size=3D"3">Dr Mich Talebzadeh</font></p><font color=3D"#000000" face=
=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Cali=
bri" size=3D"3">=C2=A0</font></p><font color=3D"#000000" face=3D"Times New =
Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><span style=3D"font-family:&quot;Ari=
al&quot;,sans-serif"><font color=3D"#000000" size=3D"3">LinkedIn </font></s=
pan><i><span style=3D"font-family:&quot;Arial&quot;,sans-serif;font-size:10=
pt"><font color=3D"#000000">=C2=A0</font><a href=3D"https://www.linkedin.co=
m/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw" target=3D"_bla=
nk"><font color=3D"#0000ff">https://www.linkedin.com/profile/view?id=3DAAEA=
AAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw</font></a></span></i></p><font color=3D=
"#000000" face=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Cali=
bri" size=3D"3">=C2=A0</font></p><font color=3D"#000000" face=3D"Times New =
Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt;text-align:justify"><span style=3D"fo=
nt-family:&quot;Arial&quot;,sans-serif;font-size:10pt"><a href=3D"http://ta=
lebzadehmich.wordpress.com/" target=3D"_blank"><font color=3D"#0000ff">http=
://talebzadehmich.wordpress.com</font></a></span></p><font color=3D"#000000=
" face=3D"Times New Roman" size=3D"3">

</font><p style=3D"margin:0cm 0cm 0pt"><span style=3D"font-family:&quot;Ari=
al&quot;,sans-serif;font-size:9pt"><font color=3D"#000000">=C2=A0</font></s=
pan></p><font color=3D"#000000" face=3D"Times New Roman" size=3D"3">

</font></div></div></div>
<br></span><div><div><div class=3D"gmail_quote">On 2 March 2016 at 13:52, D=
ayong <span dir=3D"ltr">&lt;<a href=3D"mailto:willddy@gmail.com" target=3D"=
_blank">willddy@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gma=
il_quote" style=3D"margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-co=
lor:rgb(204,204,204);border-left-width:1px;border-left-style:solid"><div di=
r=3D"auto"><div>As I remember of few weeks before in Hadoop weekly news fee=
d, cloudera has a benchmark showing implala is a little better than spark S=
QL and hive with tez. You can check that. From my experience, hive is still=
 leading tool for regular ETL job since it is stable. The other tool are be=
tter for adhoc and interactive query use case. Cloudera bet on implala espe=
cially with its new kudo project.=C2=A0<br><br>Thanks,<div>Dayong</div></di=
v><div><div><div><br>On Mar 1, 2016, at 5:14 PM, Edward Capriolo &lt;<a hre=
f=3D"mailto:edlinuxguru@gmail.com" target=3D"_blank">edlinuxguru@gmail.com<=
/a>&gt; wrote:<br><br></div><blockquote type=3D"cite"><div><div dir=3D"ltr"=
>My nocks on impala. (not intended to be a post knocking impala)<div><br></=
div><div>Impala really has not delivered on the complex types that hive has=
 (after promising it for quite a while), also it only works with the &#39;b=
lessed&#39; input formats, parquet, avro, text.<br><br>It is very annoying =
to work with impala, In my version if you create a partition in hive impala=
 does not see it. You have to run &quot;refresh&quot;.=C2=A0</div><div><br>=
</div><div>In impala I do not have all the UDFS that hive has like percenti=
le, etc.=C2=A0<br><br>Impala is fast. Many data-analysts / data-scientist t=
ypes that can&#39;t wait 10 seconds for a query so when I need top produce =
something for them I make sure the data has no complex types and uses a tab=
le type that impala understands.=C2=A0</div><div><br></div><div>But for my =
work I still work primarily in hive, because I do not want to deal with all=
 the things that impala does not have/might have/ and when I need something=
 special like my own UDFs it is easier to whip up the solution in hive.=C2=
=A0<br><br>Having worked with M$ SQL server, and vertica, Impala is on par =
with them but I don&#39;think of it like i think of hive. To me it just fee=
ls like a vertica that I can cheat loading sometimes because it is backed b=
y hdfs.=C2=A0</div><div><br></div><div>Hive is something different, I am ma=
king pipelines, I am transforming data, doing streaming, writing custom udf=
s, querying JSON directly. Its not !=3D impala.<br><br>::random message of =
the day::</div><div><br><br>=C2=A0</div></div><div class=3D"gmail_extra"><b=
r><div class=3D"gmail_quote">On Tue, Mar 1, 2016 at 4:38 PM, Ashok Kumar <s=
pan dir=3D"ltr">&lt;<a href=3D"mailto:ashok34668@yahoo.com" target=3D"_blan=
k">ashok34668@yahoo.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail=
_quote" style=3D"margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-colo=
r:rgb(204,204,204);border-left-width:1px;border-left-style:solid"><div><div=
 style=3D"color:rgb(0,0,0);font-family:times new roman,new york,times,serif=
;font-size:16px;background-color:rgb(255,255,255)"><div><br></div><div>Dr M=
itch,</div><div><br></div><div dir=3D"ltr">My two cents here.</div><div dir=
=3D"ltr"><br></div><div dir=3D"ltr">I don&#39;t have direct experience of I=
mpala but in my humble opinion I share your views that Hive provides the be=
st metastore of all Big Data systems. Looking around almost every product i=
n one form and shape use Hive code somewhere. My colleagues inform me that =
Hive is one of the most stable Big Data products.</div><div dir=3D"ltr"><br=
></div><div dir=3D"ltr">With the capabilities of Spark on Hive and Hive on =
Spark or Tez plus of course MR, there is really little need for many other =
products in the same space. It is good to keep things simple.</div><div><br=
></div><span><div>Warmest</div><div></div></span> <div><br><br></div><div s=
tyle=3D"display:block"> <div style=3D"font-family:times new roman,new york,=
times,serif;font-size:16px"> <div style=3D"font-family:HelveticaNeue,Helvet=
ica Neue,Helvetica,Arial,Lucida Grande,Sans-Serif;font-size:16px"> <div dir=
=3D"ltr"><font face=3D"Arial" size=3D"2"> On Tuesday, 1 March 2016, 11:33, =
Mich Talebzadeh &lt;<a href=3D"mailto:mich.talebzadeh@gmail.com" target=3D"=
_blank">mich.talebzadeh@gmail.com</a>&gt; wrote:<br></font></div>  <br><br>=
 <div><div><div dir=3D"ltr"><div>I have not heard of Impala anymore. I saw =
an article in LinkedIn titled</div><div><br></div><div>&quot;Apache Hive Or=
 Cloudera Impala? What is Best for me?&quot;</div><div><br></div><div>&quot=
;We can access all objects from Hive data warehouse with HiveQL which lever=
ages the map-reduce architecture in background for data retrieval and trans=
formation and this results in latency.&quot; </div><div><br></div><div>My r=
esponse was</div><div><br></div><div>This statement is no longer valid as y=
ou have choices of three engines now with MR, Spark and Tez. I have not use=
d Impala myself as I don&#39;t think there is a need for it with Hive on Sp=
ark or Spark using Hive metastore providing whatever needed. Hive is for Da=
ta Warehouse and provides what is says on the tin. Please also bear in mind=
 that Hive offers ORC storage files that provide store Index capabilities f=
urther optimizing the queries with additional stats at file, stripe and row=
 group levels.=C2=A0</div><div><br></div><div>Anyway the question is with H=
ive on Spark or Spark using Hive metastore what we cannot achieve that we c=
an achieve with Impala?</div><div><br></div><div><br clear=3D"all"></div><d=
iv><div><div dir=3D"ltr"><font color=3D"#000000" face=3D"Times New Roman" s=
ize=3D"3">

</font><div style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Ca=
libri" size=3D"3">Dr Mich Talebzadeh</font></div><font color=3D"#000000" fa=
ce=3D"Times New Roman" size=3D"3">

</font><div style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Ca=
libri" size=3D"3">=C2=A0</font></div><font color=3D"#000000" face=3D"Times =
New Roman" size=3D"3">

</font><div style=3D"margin:0cm 0cm 0pt"><span><font color=3D"#000000" size=
=3D"3">LinkedIn </font></span><i><span style=3D"font-size:10pt"><font color=
=3D"#000000">=C2=A0</font><a href=3D"https://www.linkedin.com/profile/view?=
id=3DAAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw" target=3D"_blank" rel=3D"nofo=
llow"><font color=3D"#0000ff">https://www.linkedin.com/profile/view?id=3DAA=
EAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw</font></a></span></i></div><font colo=
r=3D"#000000" face=3D"Times New Roman" size=3D"3">

</font><div style=3D"margin:0cm 0cm 0pt"><font color=3D"#000000" face=3D"Ca=
libri" size=3D"3">=C2=A0</font></div><font color=3D"#000000" face=3D"Times =
New Roman" size=3D"3">

</font><div style=3D"margin:0cm 0cm 0pt;text-align:justify"><span style=3D"=
font-size:10pt"><a href=3D"http://talebzadehmich.wordpress.com/" target=3D"=
_blank" rel=3D"nofollow"><font color=3D"#0000ff">http://talebzadehmich.word=
press.com</font></a></span></div><font color=3D"#000000" face=3D"Times New =
Roman" size=3D"3">

</font><div style=3D"margin:0cm 0cm 0pt"><span style=3D"font-size:9pt"><fon=
t color=3D"#000000">=C2=A0</font></span></div><font color=3D"#000000" face=
=3D"Times New Roman" size=3D"3">

</font></div></div></div>
</div></div><br><br></div>  </div> </div>  </div></div></div></blockquote><=
/div><br></div>
</div></blockquote></div></div></div></blockquote></div><br></div></div></d=
iv>
</blockquote></div><br></div>
</div></blockquote></div></div></div></blockquote></div><br></div>

--001a11440dfa916740052d15030e--