hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Haviv <danielru...@gmail.com>
Subject Re: Which [open-souce] SQL engine atop Hadoop?
Date Tue, 27 Jan 2015 09:13:29 GMT
Can you elaborate on why you prefer Tajo?


> On 27 בינו׳ 2015, at 10:35, Azuryy Yu <azuryyyu@gmail.com> wrote:
> You almost list all open sourced MPP real time SQL-ON-Hadoop.
> I prefer Tajo, which was relased by 0.9.0 recently, and still working in progress for
>> On Mon, Jan 26, 2015 at 10:19 PM, Samuel Marks <samuelmarks@gmail.com> wrote:
>> Since Hadoop came out, there have been various commercial and/or open-source attempts
to expose some compatibility with SQL.
>> I am seeking one which is good for low-latency querying, and supports the most common
CRUD, including [the basics!] along these lines: CREATE TABLE, INSERT INTO, SELECT * FROM,
>> I will be utilising them from Python, however there does seem to be a Python JDBC
wrapper. Additionally it needs to be scalable for big and small data (starting on a single-node
>> Here is what I've found thus far:
>> Apache Hive (SQL-like, with interactive SQL thanks to the Stinger initiative)
>> Apache Drill (ANSI SQL support)
>> Apache Spark (Spark SQL, queries only, add data via Hive, RDD or Paraquet)
>> Apache Phoenix (built atop Apache HBase, lacks full transaction support, relational
operators and some built-in functions)
>> Presto from Facebook (can query Hive, Cassandra, relational DBs &etc. Doesn't
seem to be designed for low-latency responses across small clusters, or support UPDATE operations.
It is optimized for data warehousing or analytics¹)
>> SQL-Hadoop via MapR community edition (seems to be a packaging of Hive, HP Vertica,
SparkSQL, Drill and a native ODBC wrapper)
>> Apache Kylin from Ebay (provides an SQL interface and multi-dimensional analysis
[OLAP], "… offers ANSI SQL on Hadoop and supports most ANSI SQL query functions". It depends
on HDFS, MapReduce, Hive and HBase; and seems targeted at very large data-sets though maintains
low query latency)
>> Apache Tajo (ANSI/ISO SQL standard compliance with JDBC driver support [benchmarks
against Hive and Impala])
>> Cascading's Lingual² ("Lingual provides JDBC Drivers, a SQL command shell, and a
catalog manager for publishing files [or any resource] as schemas and tables.")
>> Which—from this list or elsewhere—would you recommend, and why?
>> Thanks for all suggestions,
>> Samuel Marks
>> http://linkedin.com/in/samuelmarks

View raw message