hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <>
Subject Hive-Hbase vs Phoenix-Hbase
Date Thu, 05 May 2016 21:55:12 GMT

on this topic,

All along the conclusion seem to be "quote"

   1. The Hive is batch oriented(aka slow), it transfer the SQL query to
   MapReduce jobs, it mostly used in offline data processing.
   2. The Phoenix is a SQL layer between applications and Hbase, it provide
   ad-hoc queries in real time.

Fine that was notes from 2014 in here
>From the Phoenix web page it says:

"he Phoenix query engine transforms your SQL query into one or more Hbase
scans, and orchestrates their execution to produce standard JDBC result
sets. Direct use of the Hbase API, along with coprocessors and custom
filters, results in performance on the order of milliseconds for small
queries, or seconds for tens of millions of rows."

But things have changed now. Map-reduce "MR" in Hive 2 is depreciated and
one more and less can do same in-memory speed with Hive on Spark engine.
Hive does not convert the query to MR, it uses Spark that has DAG and IIMDB
combined. Sure we are talking about OLTP speed with Phoenix and single row
DML but Hive offers variety of table fornats

Bottom line, has there been any comparative studies of this recently to
gauge the performance of Hive on Spark vs Phoenix?

Sounds like the distinguishing feature is that Phoenix does an
a-synchronous write to Hbase and leaves Hbase to handle the work completion
through its API and as expected Phoenix relies on memory (what else) to
speed up this process. Hive on newer engine can do most of this these days.


Dr Mich Talebzadeh

LinkedIn *

View raw message