hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashok Kumar <>
Subject Re: Hive and Impala
Date Tue, 01 Mar 2016 21:38:52 GMT

Dr Mitch,
My two cents here.
I don't have direct experience of Impala but in my humble opinion I share your views that
Hive provides the best metastore of all Big Data systems. Looking around almost every product
in one form and shape use Hive code somewhere. My colleagues inform me that Hive is one of
the most stable Big Data products.
With the capabilities of Spark on Hive and Hive on Spark or Tez plus of course MR, there is
really little need for many other products in the same space. It is good to keep things simple.

    On Tuesday, 1 March 2016, 11:33, Mich Talebzadeh <> wrote:

 I have not heard of Impala anymore. I saw an article in LinkedIn titled
"Apache Hive Or Cloudera Impala? What is Best for me?"
"We can access all objects from Hive data warehouse with HiveQL which leverages the map-reduce
architecture in background for data retrieval and transformation and this results in latency."

My response was
This statement is no longer valid as you have choices of three engines now with MR, Spark
and Tez. I have not used Impala myself as I don't think there is a need for it with Hive on
Spark or Spark using Hive metastore providing whatever needed. Hive is for Data Warehouse
and provides what is says on the tin. Please also bear in mind that Hive offers ORC storage
files that provide store Index capabilities further optimizing the queries with additional
stats at file, stripe and row group levels. 
Anyway the question is with Hive on Spark or Spark using Hive metastore what we cannot achieve
that we can achieve with Impala?

Dr Mich Talebzadeh LinkedIn 

View raw message