hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siddharth Tiwari <>
Subject Re: Which [open-souce] SQL engine atop Hadoop?
Date Fri, 30 Jan 2015 12:35:51 GMT
Have you looked at HAWQ from Pivotal ?

Sent from my iPhone

> On Jan 30, 2015, at 4:27 AM, Samuel Marks <> wrote:
> Since Hadoop came out, there have been various commercial and/or open-source attempts
to expose some compatibility with SQL. Obviously by posting here I am not expecting an unbiased
> Seeking an SQL-on-Hadoop offering which provides: low-latency querying, and supports
the most common CRUD, including [the basics!] along these lines: CREATE TABLE, INSERT INTO,
SELECT * FROM, UPDATE Table SET C1=2 WHERE, DELETE FROM, and DROP TABLE. Transactional support
would be nice also, but is not a must-have.
> Essentially I want a full replacement for the more traditional RDBMS, one which can scale
from 1 node to a serious Hadoop cluster.
> Python is my language of choice for interfacing, however there does seem to be a Python
JDBC wrapper.
> Here is what I've found thus far:
> Apache Hive (SQL-like, with interactive SQL thanks to the Stinger initiative)
> Apache Drill (ANSI SQL support)
> Apache Spark (Spark SQL, queries only, add data via Hive, RDD or Paraquet)
> Apache Phoenix (built atop Apache HBase, lacks full transaction support, relational operators
and some built-in functions)
> Cloudera Impala (significant HiveQL support, some SQL language support, no support for
indexes on its tables, importantly missing DELETE, UPDATE and INTERSECT; amongst others)
> Presto from Facebook (can query Hive, Cassandra, relational DBs &etc. Doesn't seem
to be designed for low-latency responses across small clusters, or support UPDATE operations.
It is optimized for data warehousing or analytics¹)
> SQL-Hadoop via MapR community edition (seems to be a packaging of Hive, HP Vertica, SparkSQL,
Drill and a native ODBC wrapper)
> Apache Kylin from Ebay (provides an SQL interface and multi-dimensional analysis [OLAP],
"… offers ANSI SQL on Hadoop and supports most ANSI SQL query functions". It depends on
HDFS, MapReduce, Hive and HBase; and seems targeted at very large data-sets though maintains
low query latency)
> Apache Tajo (ANSI/ISO SQL standard compliance with JDBC driver support [benchmarks against
Hive and Impala])
> Cascading's Lingual² ("Lingual provides JDBC Drivers, a SQL command shell, and a catalog
manager for publishing files [or any resource] as schemas and tables.")
> Which—from this list or elsewhere—would you recommend, and why?
> Thanks for all suggestions,
> Samuel Marks
View raw message