Return-Path: X-Original-To: apmail-hawq-user-archive@minotaur.apache.org Delivered-To: apmail-hawq-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3687118879 for ; Fri, 13 Nov 2015 14:59:40 +0000 (UTC) Received: (qmail 33798 invoked by uid 500); 13 Nov 2015 14:59:40 -0000 Delivered-To: apmail-hawq-user-archive@hawq.apache.org Received: (qmail 33744 invoked by uid 500); 13 Nov 2015 14:59:40 -0000 Mailing-List: contact user-help@hawq.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hawq.incubator.apache.org Delivered-To: mailing list user@hawq.incubator.apache.org Received: (qmail 33733 invoked by uid 99); 13 Nov 2015 14:59:40 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Nov 2015 14:59:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 9094A1A0744 for ; Fri, 13 Nov 2015 14:59:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.902 X-Spam-Level: *** X-Spam-Status: No, score=3.902 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=3, MIME_QP_LONG_LINE=0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id WbasIvQ-xq7J for ; Fri, 13 Nov 2015 14:59:26 +0000 (UTC) Received: from mail-ig0-f178.google.com (mail-ig0-f178.google.com [209.85.213.178]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id DAB5743A00 for ; Fri, 13 Nov 2015 14:59:25 +0000 (UTC) Received: by igbxm8 with SMTP id xm8so15841198igb.1 for ; Fri, 13 Nov 2015 06:59:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:content-type:content-transfer-encoding:mime-version:subject :message-id:date:references:in-reply-to:to; bh=iqvKYTcoZctuPHLDLd7tjCqRPX8XpBKgy5Yi70b91Zs=; b=oYqgmeg19SXB8g+m/q9sf54nxH3GWinJvXKiMbj9LkeKw4KVK2VFaPBWUjbNI3Pvoa nkx/9lh18dJBzTdNyF+dvf9azSriNncATGwcYX2jgtWT7SkItnBH6G4zaFJla3zSrkPf hDO2VZHOmwZnmuGDBjFoX5C37P+exR65V8wEO0QBTUqUoYoq6B6wzhPL+AYrZI4OY8ud cSrPu6iOwwTAhjfr18RRxOEZygfzU0+k/bWiu/SDkJuVocSO7VZ9GJjpvxig7LDmoIQX 356lCCdKajebeOlbUR4/fiCQfOKGAqQ9aHSmiGwT6wBIRnPFtSMffDGq6b97DDtWdq1s KZbg== X-Received: by 10.50.129.5 with SMTP id ns5mr3672523igb.50.1447426765099; Fri, 13 Nov 2015 06:59:25 -0800 (PST) Received: from ?IPv6:2600:1005:b00f:70e:b9b3:a1fd:231:bd24? ([2600:1005:b00f:70e:b9b3:a1fd:231:bd24]) by smtp.gmail.com with ESMTPSA id rt5sm1449994igb.20.2015.11.13.06.59.23 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 13 Nov 2015 06:59:24 -0800 (PST) From: Dan Baskette Content-Type: multipart/alternative; boundary=Apple-Mail-6E992AAF-2F91-4632-A1B9-442C33160584 Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (1.0) Subject: Re: what is Hawq? Message-Id: <5C63FE3E-8B7F-4530-B812-7D5E856603AC@gmail.com> Date: Fri, 13 Nov 2015 09:59:22 -0500 References: <3a03bd1e.6ebf.150dae985fd.Coremail.hawqstudy@163.com> <2da2dad9.bc6f.150dbd9622f.Coremail.hawqstudy@163.com> <7a189f42.d7ab.150dc178d50.Coremail.hawqstudy@163.com> In-Reply-To: To: user@hawq.incubator.apache.org X-Mailer: iPhone Mail (13B143) --Apple-Mail-6E992AAF-2F91-4632-A1B9-442C33160584 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hive doesn't have the level of SQL support that HAWQ provides especially aro= und sub-selects. SparkSQL only support a subset of HiveQL, so the differen= ce there is even bigger. =20 Sent from my iPhone > On Nov 13, 2015, at 9:39 AM, Biswas, Supriya w= rote: >=20 > Hello All =E2=80=93 > =20 > Hive 0.14 supports ACID and also supports transactions. Spark supports Hiv= e queries (HQL). > =20 > Did anyone compare HAWQ with spark SQL or Hive HQL on Spark? > =20 > Thanks. > =20 > Supriyo Biswas > Architect =E2=80=93 CPS Service Delivery > The Nielsen Company > Office (516) 682-6021/NETS 249-6021 > Cell (516) 353-6795 > www.nielsen.com > =20 > From: Atri Sharma [mailto:atri@apache.org]=20 > Sent: Friday, November 13, 2015 3:53 AM > To: user@hawq.incubator.apache.org > Subject: Re: what is Hawq? > =20 > Greenplum is open sourced. >=20 > The main difference is between the two engines is that HAWQ is more for Ha= doop based systems whereas Greenplum is more towards regular FS. This is a v= ery high level difference between the two, the differences are more detailed= . But a single line difference between the two is the one I wrote. >=20 > On 13 Nov 2015 14:20, "Adaryl "Bob" Wakefield, MBA" wrote: > Is Greenplum free? I heard they open sourced it but I haven=E2=80=99t foun= d anything but a community edition. > =20 > Adaryl "Bob" Wakefield, MBA > Principal > Mass Street Analytics, LLC > 913.938.6685 > www.linkedin.com/in/bobwakefieldmba > Twitter: @BobLovesData > =20 > From: dortmont > Sent: Friday, November 13, 2015 2:42 AM > To: user@hawq.incubator.apache.org > Subject: Re: what is Hawq? > =20 > I see the advantage of HAWQ compared to other Hadoop SQL engines. It looks= like the most mature solution on Hadoop thanks to the postgresql based engi= ne. > =20 > But why wouldn't I use Greenplum instead of HAWQ? It has even better perfo= rmance and it supports updates. >=20 > Cheers > =20 > 2015-11-13 7:45 GMT+01:00 Atri Sharma : > +1 for transactions. >=20 > I think a major plus point is that HAWQ supports transactions, and this e= nables a lot of critical workloads to be done on HAWQ. >=20 > On 13 Nov 2015 12:13, "Lei Chang" wrote: > =20 > Like what Bob said, HAWQ is a complete database and Drill is just a query e= ngine. > =20 > And HAWQ has also a lot of other benefits over Drill, for example: > =20 > 1. SQL completeness: HAWQ is the best for the sql-on-hadoop engines, can r= un all TPCDS queries without any changes. And support almost all third party= tools, such as Tableau et al. > 2. Performance: proved the best in the hadoop world > 3. Scalability: high scalable via high speed UDP based interconnect. > 4. Transactions: as I know, drill does not support transactions. it is a n= ightmare for end users to keep consistency. > 5. Advanced resource management: HAWQ has the most advanced resource manag= ement. It natively supports YARN and easy to use hierarchical resource queue= s. Resources can be managed and enforced on query and operator level. > =20 > Cheers > Lei > =20 > =20 > On Fri, Nov 13, 2015 at 9:34 AM, Adaryl "Bob" Wakefield, MBA wrote: > There are a lot of tools that do a lot of things. Believe me it=E2=80=99s a= full time job keeping track of what is going on in the apache world. As I u= nderstand it, Drill is just a query engine while Hawq is an actual database.= ..some what anyway. > =20 > Adaryl "Bob" Wakefield, MBA > Principal > Mass Street Analytics, LLC > 913.938.6685 > www.linkedin.com/in/bobwakefieldmba > Twitter: @BobLovesData > =20 > From: Will Wagner > Sent: Thursday, November 12, 2015 7:42 AM > To: user@hawq.incubator.apache.org > Subject: Re: what is Hawq? > =20 > Hi Lie, >=20 > Great answer. >=20 > I have a follow up question.=20 > Everything HAWQ is capable of doing is already covered by Apache Drill. W= hy do we need another tool? >=20 > Thank you,=20 > Will W >=20 > On Nov 12, 2015 12:25 AM, "Lei Chang" wrote: > =20 > Hi Bob, > =20 > Apache HAWQ is a Hadoop native SQL query engine that combines the key tech= nological advantages of MPP database with the scalability and convenience of= Hadoop. HAWQ reads data from and writes data to HDFS natively. HAWQ deliver= s industry-leading performance and linear scalability. It provides users the= tools to confidently and successfully interact with petabyte range data set= s. HAWQ provides users with a complete, standards compliant SQL interface. M= ore specifically, HAWQ has the following features: > =C2=B7 On-premise or cloud deployment > =C2=B7 Robust ANSI SQL compliance: SQL-92, SQL-99, SQL-2003, OLAP e= xtension > =C2=B7 Extremely high performance. many times faster than other Ha= doop SQL engine. > =C2=B7 World-class parallel optimizer > =C2=B7 Full transaction capability and consistency guarantee: ACID= > =C2=B7 Dynamic data flow engine through high speed UDP based inter= connect > =C2=B7 Elastic execution engine based on virtual segment & data lo= cality > =C2=B7 Support multiple level partitioning and List/Range based pa= rtitioned tables. > =C2=B7 Multiple compression method support: snappy, gzip, quicklz,= RLE > =C2=B7 Multi-language user defined function support: python, perl,= java, c/c++, R > =C2=B7 Advanced machine learning and data mining functionalities t= hrough MADLib > =C2=B7 Dynamic node expansion: in seconds > =C2=B7 Most advanced three level resource management: Integrate wi= th YARN and hierarchical resource queues. > =C2=B7 Easy access of all HDFS data and external system data (for e= xample, HBase) > =C2=B7 Hadoop Native: from storage (HDFS), resource management (YA= RN) to deployment (Ambari). > =C2=B7 Authentication & Granular authorization: Kerberos, SSL and r= ole based access > =C2=B7 Advanced C/C++ access library to HDFS and YARN: libhdfs3 & l= ibYARN > =C2=B7 Support most third party tools: Tableau, SAS et al. > =C2=B7 Standard connectivity: JDBC/ODBC > =20 > And the link here can give you more information around hawq: https://cwiki= .apache.org/confluence/display/HAWQ/About+HAWQ=20 > =20 > =20 > And please also see the answers inline to your specific questions: > =20 > On Thu, Nov 12, 2015 at 4:09 PM, Adaryl "Bob" Wakefield, MBA wrote: > Silly question right? Thing is I=E2=80=99ve read a bit and watched some Yo= uTube videos and I=E2=80=99m still not quite sure what I can and can=E2=80=99= t do with Hawq. Is it a true database or is it like Hive where I need to use= HCatalog? > =20 > It is a true database, you can think it is like a parallel postgres but wi= th much more functionalities and it works natively in hadoop world. HCatalog= is not necessary. But you can read data registered in HCatalog with the new= feature "hcatalog integration". > =20 > Can I write data intensive applications against it using ODBC? Does it enf= orce referential integrity? Does it have stored procedures? > =20 > ODBC: yes, both JDBC/ODBC are supported > referential integrity: currently not supported. > Stored procedures: yes. > =20 > B. > =20 > =20 > Please let us know if you have any other questions. > =20 > Cheers > Lei > =20 > =20 > =20 > =20 --Apple-Mail-6E992AAF-2F91-4632-A1B9-442C33160584 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
Hive doesn't have the level of SQL sup= port that HAWQ provides especially around sub-selects.   SparkSQL only s= upport a subset of HiveQL, so the difference there is even bigger.  
Sent from my iPhone

On Nov 13, 2015, at 9:39 AM, Biswas, S= upriya <Supriya.Biswas@niel= sen.com> wrote:

Hello All =E2=80=93

 

Hive 0.14 supports ACID and= also supports transactions. Spark supports Hive queries (HQL).

 

Did anyone compare HAWQ wit= h spark SQL or Hive HQL on Spark?

 

Thanks.

 

Supriyo Biswas
Architect =E2=80=93 CPS Service Delivery<= br> The Nielsen Company
Office (516) 682-6021/NETS 249-6021

Cell     (= 516) 353-6795
www.nielsen.com

 

From: Atri Sharma= [mailto:atri@apache.org]
Sent: Friday, November 13, 2015 3:53 AM
To: user@hawq.incub= ator.apache.org
Subject: Re: what is Hawq?

 

Greenplum is open sourced.

The main difference is between the two engines is that HAWQ is more for H= adoop based systems whereas Greenplum is more towards regular FS. This is a v= ery high level difference between the two, the differences are more detailed= . But a single line difference between the two is the one I wrote.

On 13 Nov 2015 14:20, "Adaryl "Bob" Wakefield, MBA" &= lt;adaryl.wakefield@hotmail.= com> wrote:

Is Greenplum free? I heard they open sourced i= t but I haven=E2=80=99t found anything but a community edition.

 

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www= .linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

 

From: dortmont

Sent: Friday, November 13, 2015 2:42 A= M

Subject: Re: what is Hawq?=

 

I see the advantage of HAWQ compared to other H= adoop SQL engines. It looks like the most mature solution on Hadoop thanks t= o the postgresql based engine.

 

But why wouldn't I use Greenplum instead of HA= WQ? It has even better performance and it supports updates.


Cheers

 

2015-11-13 7:45 GMT+01:00 Atri Sharma <atri@apache.org>:

+1 for transactions.

I think a major plus point is that HAWQ supports transactions,&nbs= p; and this enables a lot of critical workloads to be done on HAWQ.

On 13 Nov 2015 12:13, "Lei Chang" <chang.lei.cn@gmail.com= > wrote:

 

Like what Bob said, HAWQ is a complete databas= e and Drill is just a query engine.

 

And HAWQ has also a lot of other benefits over= Drill, for example:

 

1. SQL completeness: HAWQ is the best for the s= ql-on-hadoop engines, can run all TPCDS queries without any changes. And sup= port almost all third party tools, such as Tableau et al.

2. Performance: proved the best in the hadoop w= orld

3. Scalability: high scalable via high speed U= DP based interconnect.

4. Transactions: as I know, drill does not sup= port transactions. it is a nightmare for end users to keep consistency.=

5. Advanced resource management: HAWQ has the m= ost advanced resource management. It natively supports YARN and easy to use h= ierarchical resource queues. Resources can be managed and enforced on query and operator level.

 

Cheers

Lei

 

 

On Fri, Nov 13, 2015 at 9:34 AM, Adaryl "Bob" W= akefield, MBA <adaryl.wakefield@hotmail.com> wrote:

There are a lot of tools that do a lot of thin= gs. Believe me it=E2=80=99s a full time job keeping track of what is going o= n in the apache world. As I understand it, Drill is just a query engine while Hawq is an actual database...some what anyway.

 

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www= .linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

 

From: Will Wagner

Sent: Thursday, November 12, 2015 7:4= 2 AM

Subject: Re: what is Hawq?=

 

Hi Lie,

Great answer.

I have a follow up question.
Everything HAWQ is capable of doing is already covered by Apache Drill. = ; Why do we need another tool?

Thank you,
Will W

On Nov 12, 2015 12:25 AM, "Lei Chang" <chang.lei.cn@gmail.c= om> wrote:

 

Hi Bob,

 

Apache HAWQ is a Hadoop native SQL query engine that combines t= he key technological advantages of MPP database with the scalability and convenience of Hadoop. HAWQ reads data from and writes d= ata to HDFS natively. HAWQ delivers industry-leading performance and linear s= calability. It provides users the tools to confidently and successfully inte= ract with petabyte range data sets. HAWQ provides users with a complete, standards compliant SQL interfac= e. More specifically, HAWQ has the following features:

=

=C2=B7       &n= bsp; On-premise or clo= ud deployment

=C2=B7       &n= bsp; Robust ANSI SQL c= ompliance: SQL-92, SQL-99, SQL-2003, OLAP extension

=C2=B7       &n= bsp; Extremely high pe= rformance. many times faster than other Hadoop SQL engine.

=C2=B7       &n= bsp; World-class paral= lel optimizer

=C2=B7       &n= bsp; Full transaction c= apability and consistency guarantee: ACID

=C2=B7       &n= bsp; Dynamic data flow= engine through high speed UDP based interconnect

=C2=B7       &n= bsp; Elastic execution= engine based on virtual segment & data locality

=C2=B7       &n= bsp; Support multiple l= evel partitioning and List/Range based partitioned tables.

=C2=B7       &n= bsp; Multiple compress= ion method support: snappy, gzip, quicklz, RLE

=C2=B7       &n= bsp; Multi-language us= er defined function support: python, perl, java, c/c++, R

=C2=B7       &n= bsp; Advanced machine l= earning and data mining functionalities through MADLib

=C2=B7       &n= bsp; Dynamic node expa= nsion: in seconds

=C2=B7       &n= bsp; Most advanced thr= ee level resource management: Integrate with YARN and hierarchical resource q= ueues.

=C2=B7       &n= bsp; Easy access of al= l HDFS data and external system data (for example, HBase)

=C2=B7       &n= bsp; Hadoop Native: fr= om storage (HDFS), resource management (YARN) to deployment (Ambari).

=C2=B7       &n= bsp; Authentication &a= mp; Granular authorization: Kerberos, SSL and role based access

=C2=B7       &n= bsp; Advanced C/C++ ac= cess library to HDFS and YARN: libhdfs3 & libYARN

=C2=B7       &n= bsp; Support most thir= d party tools: Tableau, SAS et al.

=C2=B7       &n= bsp; Standard connecti= vity: JDBC/ODBC

 

And the link here can give you more informatio= n around hawq: https://cwiki.apache.org/confluence/display/HAWQ/About+HAWQ 

 

 

And please also see the answers inline to your= specific questions:

 

On Thu, Nov 12, 2015 at 4:09 PM, Adaryl "Bob" W= akefield, MBA <adaryl.wakefield@hotmail.com> wrote:

Silly question right? Thing is I=E2=80=99ve re= ad a bit and watched some YouTube videos and I=E2=80=99m still not quite sur= e what I can and can=E2=80=99t do with Hawq. Is it a true database or is it l= ike Hive where I need to use HCatalog?

 

It is a true database, you can think it is lik= e a parallel postgres but with much more functionalities and it works native= ly in hadoop world. HCatalog is not necessary. But you can read data registered in HCatalog with the new feature "hcatalog integra= tion".

 

Can I write data intensive applications agains= t it using ODBC? Does it enforce referential integrity? Does it have stored p= rocedures?

 

ODBC: yes, both JDBC/ODBC are supported

referential integrity: currently not supported= .

Stored procedures: yes.

 

B.

 

 

Please let us know if you have any other quest= ions.

 

Cheers

Lei

 

 

 

 

= --Apple-Mail-6E992AAF-2F91-4632-A1B9-442C33160584--