Return-Path: X-Original-To: apmail-hawq-user-archive@minotaur.apache.org Delivered-To: apmail-hawq-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 276561859E for ; Fri, 13 Nov 2015 06:46:10 +0000 (UTC) Received: (qmail 91990 invoked by uid 500); 13 Nov 2015 06:46:09 -0000 Delivered-To: apmail-hawq-user-archive@hawq.apache.org Received: (qmail 91943 invoked by uid 500); 13 Nov 2015 06:46:09 -0000 Mailing-List: contact user-help@hawq.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hawq.incubator.apache.org Delivered-To: mailing list user@hawq.incubator.apache.org Received: (qmail 91934 invoked by uid 99); 13 Nov 2015 06:46:09 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Nov 2015 06:46:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 4FA8C1809C1 for ; Fri, 13 Nov 2015 06:46:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.971 X-Spam-Level: *** X-Spam-Status: No, score=3.971 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id EFUEn42FQmTu for ; Fri, 13 Nov 2015 06:46:00 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with SMTP id 785F1232A7 for ; Fri, 13 Nov 2015 06:46:00 +0000 (UTC) Received: (qmail 91893 invoked by uid 99); 13 Nov 2015 06:46:00 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Nov 2015 06:46:00 +0000 Received: from mail-wm0-f47.google.com (mail-wm0-f47.google.com [74.125.82.47]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 95A591A0181 for ; Fri, 13 Nov 2015 06:45:59 +0000 (UTC) Received: by wmec201 with SMTP id c201so17048607wme.1 for ; Thu, 12 Nov 2015 22:45:58 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.194.205.137 with SMTP id lg9mr17378723wjc.156.1447397158422; Thu, 12 Nov 2015 22:45:58 -0800 (PST) Received: by 10.194.38.104 with HTTP; Thu, 12 Nov 2015 22:45:57 -0800 (PST) Received: by 10.194.38.104 with HTTP; Thu, 12 Nov 2015 22:45:57 -0800 (PST) In-Reply-To: References: <3a03bd1e.6ebf.150dae985fd.Coremail.hawqstudy@163.com> <2da2dad9.bc6f.150dbd9622f.Coremail.hawqstudy@163.com> <7a189f42.d7ab.150dc178d50.Coremail.hawqstudy@163.com> Date: Fri, 13 Nov 2015 12:15:57 +0530 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: what is Hawq? From: Atri Sharma To: user@hawq.incubator.apache.org Content-Type: multipart/alternative; boundary=047d7bae452a7a4a6505246669d3 --047d7bae452a7a4a6505246669d3 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable +1 for transactions. I think a major plus point is that HAWQ supports transactions, and this enables a lot of critical workloads to be done on HAWQ. On 13 Nov 2015 12:13, "Lei Chang" wrote: > > Like what Bob said, HAWQ is a complete database and Drill is just a query > engine. > > And HAWQ has also a lot of other benefits over Drill, for example: > > 1. SQL completeness: HAWQ is the best for the sql-on-hadoop engines, can > run all TPCDS queries without any changes. And support almost all third > party tools, such as Tableau et al. > 2. Performance: proved the best in the hadoop world > 3. Scalability: high scalable via high speed UDP based interconnect. > 4. Transactions: as I know, drill does not support transactions. it is a > nightmare for end users to keep consistency. > 5. Advanced resource management: HAWQ has the most advanced resource > management. It natively supports YARN and easy to use hierarchical resour= ce > queues. Resources can be managed and enforced on query and operator level= . > > Cheers > Lei > > > On Fri, Nov 13, 2015 at 9:34 AM, Adaryl "Bob" Wakefield, MBA < > adaryl.wakefield@hotmail.com> wrote: > >> There are a lot of tools that do a lot of things. Believe me it=E2=80=99= s a full >> time job keeping track of what is going on in the apache world. As I >> understand it, Drill is just a query engine while Hawq is an actual >> database...some what anyway. >> >> Adaryl "Bob" Wakefield, MBA >> Principal >> Mass Street Analytics, LLC >> 913.938.6685 >> www.linkedin.com/in/bobwakefieldmba >> Twitter: @BobLovesData >> >> *From:* Will Wagner >> *Sent:* Thursday, November 12, 2015 7:42 AM >> *To:* user@hawq.incubator.apache.org >> *Subject:* Re: what is Hawq? >> >> >> Hi Lie, >> >> Great answer. >> >> I have a follow up question. >> Everything HAWQ is capable of doing is already covered by Apache Drill. >> Why do we need another tool? >> >> Thank you, >> Will W >> On Nov 12, 2015 12:25 AM, "Lei Chang" wrote: >> >>> >>> Hi Bob, >>> >>> >>> Apache HAWQ is a Hadoop native SQL query engine that combines the key >>> technological advantages of MPP database with the scalability and >>> convenience of Hadoop. HAWQ reads data from and writes data to HDFS >>> natively. HAWQ delivers industry-leading performance and linear >>> scalability. It provides users the tools to confidently and successfull= y >>> interact with petabyte range data sets. HAWQ provides users with a >>> complete, standards compliant SQL interface. More specifically, HAWQ ha= s >>> the following features: >>> >>> - On-premise or cloud deployment >>> - Robust ANSI SQL compliance: SQL-92, SQL-99, SQL-2003, OLAP >>> extension >>> - Extremely high performance. many times faster than other Hadoop >>> SQL engine. >>> - World-class parallel optimizer >>> - Full transaction capability and consistency guarantee: ACID >>> - Dynamic data flow engine through high speed UDP based interconnect >>> - Elastic execution engine based on virtual segment & data locality >>> - Support multiple level partitioning and List/Range based >>> partitioned tables. >>> - Multiple compression method support: snappy, gzip, quicklz, RLE >>> - Multi-language user defined function support: python, perl, java, >>> c/c++, R >>> - Advanced machine learning and data mining functionalities through >>> MADLib >>> - Dynamic node expansion: in seconds >>> - Most advanced three level resource management: Integrate with YARN >>> and hierarchical resource queues. >>> - Easy access of all HDFS data and external system data (for >>> example, HBase) >>> - Hadoop Native: from storage (HDFS), resource management (YARN) to >>> deployment (Ambari). >>> - Authentication & Granular authorization: Kerberos, SSL and role >>> based access >>> - Advanced C/C++ access library to HDFS and YARN: libhdfs3 & libYARN >>> - Support most third party tools: Tableau, SAS et al. >>> - Standard connectivity: JDBC/ODBC >>> >>> >>> And the link here can give you more information around hawq: >>> https://cwiki.apache.org/confluence/display/HAWQ/About+HAWQ >>> >>> >>> And please also see the answers inline to your specific questions: >>> >>> On Thu, Nov 12, 2015 at 4:09 PM, Adaryl "Bob" Wakefield, MBA < >>> adaryl.wakefield@hotmail.com> wrote: >>> >>>> Silly question right? Thing is I=E2=80=99ve read a bit and watched som= e YouTube >>>> videos and I=E2=80=99m still not quite sure what I can and can=E2=80= =99t do with Hawq. Is >>>> it a true database or is it like Hive where I need to use HCatalog? >>>> >>> >>> It is a true database, you can think it is like a parallel postgres but >>> with much more functionalities and it works natively in hadoop world. >>> HCatalog is not necessary. But you can read data registered in HCatalog >>> with the new feature "hcatalog integration". >>> >>> >>>> Can I write data intensive applications against it using ODBC? Does it >>>> enforce referential integrity? Does it have stored procedures? >>>> >>> >>> ODBC: yes, both JDBC/ODBC are supported >>> referential integrity: currently not supported. >>> Stored procedures: yes. >>> >>> >>>> B. >>>> >>> >>> >>> Please let us know if you have any other questions. >>> >>> Cheers >>> Lei >>> >>> >>> >> > --047d7bae452a7a4a6505246669d3 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

+1 for transactions.

I think a major plus point is that HAWQ supports transaction= s,=C2=A0 and this enables a lot of critical workloads to be done on HAWQ.

On 13 Nov 2015 12:13, "Lei Chang" <= chang.lei.cn@gmail.com> wr= ote:

Like what Bob said, HAWQ is a complete database and Drill = is just a query engine.

And HAWQ has also a lot of other= benefits over Drill, for example:

1. SQL complete= ness: HAWQ is the best for the sql-on-hadoop engines, can run all TPCDS que= ries without any changes. And support almost all third party tools, such as= Tableau et al.
2. Performance: proved the best in the hadoop wor= ld
3. Scalability: high scalable via high speed UDP based interco= nnect.
4. Transactions: as I know, drill does not support transac= tions. it is a nightmare for end users to keep consistency.
<= div>5. Advanced resource management: HAWQ has the most advanced resource ma= nagement. It natively supports YARN and easy to use hierarchical resource q= ueues. Resources can be managed and enforced on query and operator level.

Cheers
Lei


On Fr= i, Nov 13, 2015 at 9:34 AM, Adaryl "Bob" Wakefield, MBA <adaryl.wakefield@hotmail.com> wrote:
There are a lot of tools that do a lot of things. Believe me it=E2=80= =99s a full=20 time job keeping track of what is going on in the apache world. As I unders= tand=20 it, Drill is just a query engine while Hawq is an actual database...some wh= at=20 anyway.
=C2=A0
Adaryl= =20 "Bob" Wakefield, MBA
Principal
Mass Street Analytics,=20 LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter:=20 @BobLovesData
=C2=A0
Sent: Thursday, November 12, 2015 7:42 AM
Subject: Re: what is Hawq?
=C2=A0

Hi Lie,

Great answer.

I have a follow up question.
Everything HAWQ is capable = of doing=20 is already covered by Apache Drill.=C2=A0 Why do we need another tool?

Thank you,
Will W

On Nov 12, 2015 12:25 AM, "Lei Chang" = <chang.lei.c= n@gmail.com> wrote:
=C2=A0
Hi Bob,
=C2=A0

Apache=20 HAWQ is a Hadoop native SQL query engine that combines the key technologi= cal=20 advantages of MPP database with the scalability and convenience of Hadoop= .=20 HAWQ reads data from and writes data to HDFS natively. HAWQ delivers=20 industry-leading performance and linear scalability. It provides users th= e=20 tools to confidently and successfully interact with petabyte range data s= ets.=20 HAWQ provides users with a complete, standards compliant SQL interface. M= ore=20 specifically, HAWQ has the following features:

  • On-premise or cloud deployment=20
  • Robust ANSI SQL compliance: SQL-92, SQL-99, SQL-2003, OLAP ext= ension=20
  • Extremely high performance.= many times=20 faster than other Hadoop SQL engine.=20
  • World-class parallel optimizer=20
  • Full transaction capability and consistency guarantee: ACID=20
  • Dynamic data flow engine through high speed UDP based intercon= nect=20
  • Elastic execution engine based on virtual segment & data l= ocality=20
  • Support multiple level partitioning and List/Range based parti= tioned=20 tables.=20
  • Multiple compression method support: snappy, gzip, quicklz, RL= E=20
  • Multi-language user defined function support: python, perl, ja= va, c/c++,=20 R=20
  • Advanced machine learning and data mining functionalities thro= ugh MADLib=20
  • Dynamic node expansion: in seconds=20
  • Most advanced three level resource management: Integrate with = YARN and=20 hierarchical resource queues.=20
  • Easy access of all HDFS data and external system data (for exa= mple,=20 HBase)=20
  • Hadoop Native: from storage (HDFS), resource management (YARN)= to=20 deployment (Ambari).=20
  • Authentication & Granular authorization: Kerberos, SSL and= role=20 based access=20
  • Advanced C/C++ access libra= ry to HDFS=20 and YARN: libhdfs3 & libYARN=20
  • Support most third party tools: Tableau, SAS et al.
  • Standard connectivity: JDBC/ODBC
=C2=A0
And the link here can give you more information around hawq: https://cwiki.apache.org/confluence/display/HAWQ/About+HAWQ=C2= =A0
=C2=A0
=C2=A0
And please also see the answers inline to your specific questions:
=C2=A0
On Thu, Nov 12, 2015 at 4:09 PM, Adaryl "= Bob"=20 Wakefield, MBA <adaryl.wakefield@hotmail.com> w= rote:
Silly= =20 question right? Thing is I=E2=80=99ve read a bit and watched some YouTu= be videos and=20 I=E2=80=99m still not quite sure what I can and can=E2=80=99t do with H= awq. Is it a true=20 database or is it like Hive where I need to use HCatalog?=20
=C2=A0
It is a true database, you can think it is like a parallel postgres = but=20 with much more functionalities and it works natively in hadoop world. HCa= talog=20 is not necessary. But you can read data registered in HCatalog with the n= ew=20 feature "hcatalog integration".
=C2=A0
Can= =20 I write data intensive applications against it using ODBC? Does it enfo= rce=20 referential integrity? Does it have stored=20 procedures?
=C2=A0
ODBC: yes, both JDBC/ODBC are supported
referential=20 integrity: currently not supported.
Stored procedures: yes.
=C2=A0
B.
=C2=A0
=C2=A0
Please let us know if you have any other=20 questions.
=C2=A0
Cheers
Lei
=C2=A0
=C2=A0

--047d7bae452a7a4a6505246669d3--