hawq-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adaryl \"Bob\" Wakefield, MBA" <adaryl.wakefi...@hotmail.com>
Subject Re: what is Hawq?
Date Fri, 13 Nov 2015 08:50:06 GMT
Is Greenplum free? I heard they open sourced it but I haven’t found anything but a community

Adaryl "Bob" Wakefield, MBA
Mass Street Analytics, LLC
Twitter: @BobLovesData

From: dortmont 
Sent: Friday, November 13, 2015 2:42 AM
To: user@hawq.incubator.apache.org 
Subject: Re: what is Hawq?

I see the advantage of HAWQ compared to other Hadoop SQL engines. It looks like the most mature
solution on Hadoop thanks to the postgresql based engine. 

But why wouldn't I use Greenplum instead of HAWQ? It has even better performance and it supports


2015-11-13 7:45 GMT+01:00 Atri Sharma <atri@apache.org>:

  +1 for transactions.

  I think a major plus point is that HAWQ supports transactions,  and this enables a lot of
critical workloads to be done on HAWQ.

  On 13 Nov 2015 12:13, "Lei Chang" <chang.lei.cn@gmail.com> wrote:

    Like what Bob said, HAWQ is a complete database and Drill is just a query engine. 

    And HAWQ has also a lot of other benefits over Drill, for example:

    1. SQL completeness: HAWQ is the best for the sql-on-hadoop engines, can run all TPCDS
queries without any changes. And support almost all third party tools, such as Tableau et
    2. Performance: proved the best in the hadoop world
    3. Scalability: high scalable via high speed UDP based interconnect.
    4. Transactions: as I know, drill does not support transactions. it is a nightmare for
end users to keep consistency.

    5. Advanced resource management: HAWQ has the most advanced resource management. It natively
supports YARN and easy to use hierarchical resource queues. Resources can be managed and enforced
on query and operator level.


    On Fri, Nov 13, 2015 at 9:34 AM, Adaryl "Bob" Wakefield, MBA <adaryl.wakefield@hotmail.com>

      There are a lot of tools that do a lot of things. Believe me it’s a full time job
keeping track of what is going on in the apache world. As I understand it, Drill is just a
query engine while Hawq is an actual database...some what anyway.

      Adaryl "Bob" Wakefield, MBA
      Mass Street Analytics, LLC
      Twitter: @BobLovesData

      From: Will Wagner 
      Sent: Thursday, November 12, 2015 7:42 AM
      To: user@hawq.incubator.apache.org 
      Subject: Re: what is Hawq?

      Hi Lie,

      Great answer. 

      I have a follow up question. 
      Everything HAWQ is capable of doing is already covered by Apache Drill.  Why do we need
another tool?

      Thank you, 
      Will W 

      On Nov 12, 2015 12:25 AM, "Lei Chang" <chang.lei.cn@gmail.com> wrote:

        Hi Bob, 

        Apache HAWQ is a Hadoop native SQL query engine that combines the key technological
advantages of MPP database with the scalability and convenience of Hadoop. HAWQ reads data
from and writes data to HDFS natively. HAWQ delivers industry-leading performance and linear
scalability. It provides users the tools to confidently and successfully interact with petabyte
range data sets. HAWQ provides users with a complete, standards compliant SQL interface. More
specifically, HAWQ has the following features:

          a.. On-premise or cloud deployment 
          b.. Robust ANSI SQL compliance: SQL-92, SQL-99, SQL-2003, OLAP extension 
          c.. Extremely high performance. many times faster than other Hadoop SQL engine.

          d.. World-class parallel optimizer 
          e.. Full transaction capability and consistency guarantee: ACID 
          f.. Dynamic data flow engine through high speed UDP based interconnect 
          g.. Elastic execution engine based on virtual segment & data locality 
          h.. Support multiple level partitioning and List/Range based partitioned tables.

          i.. Multiple compression method support: snappy, gzip, quicklz, RLE 
          j.. Multi-language user defined function support: python, perl, java, c/c++, R 
          k.. Advanced machine learning and data mining functionalities through MADLib 
          l.. Dynamic node expansion: in seconds 
          m.. Most advanced three level resource management: Integrate with YARN and hierarchical
resource queues. 
          n.. Easy access of all HDFS data and external system data (for example, HBase) 
          o.. Hadoop Native: from storage (HDFS), resource management (YARN) to deployment
          p.. Authentication & Granular authorization: Kerberos, SSL and role based access

          q.. Advanced C/C++ access library to HDFS and YARN: libhdfs3 & libYARN 
          r.. Support most third party tools: Tableau, SAS et al.

          s.. Standard connectivity: JDBC/ODBC

        And the link here can give you more information around hawq: https://cwiki.apache.org/confluence/display/HAWQ/About+HAWQ

        And please also see the answers inline to your specific questions:

        On Thu, Nov 12, 2015 at 4:09 PM, Adaryl "Bob" Wakefield, MBA <adaryl.wakefield@hotmail.com>

          Silly question right? Thing is I’ve read a bit and watched some YouTube videos
and I’m still not quite sure what I can and can’t do with Hawq. Is it a true database
or is it like Hive where I need to use HCatalog? 

        It is a true database, you can think it is like a parallel postgres but with much
more functionalities and it works natively in hadoop world. HCatalog is not necessary. But
you can read data registered in HCatalog with the new feature "hcatalog integration".

          Can I write data intensive applications against it using ODBC? Does it enforce referential
integrity? Does it have stored procedures?

        ODBC: yes, both JDBC/ODBC are supported
        referential integrity: currently not supported.
        Stored procedures: yes.


        Please let us know if you have any other questions.


View raw message