hawq-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bob Marshall <marsha...@avalonconsult.com>
Subject Re: what is Hawq?
Date Fri, 13 Nov 2015 15:56:03 GMT
When you talk about ALL of the SQL on Hadoop tools, you need to separate
out those features implemented by the SQL tool, e.g. SQL-89, SQL-92,
SQL-2003 compliance, with what is implemented in the underlying Hadoop
space that the tool is querying, such as Columnar Storage in HBase,
compression in HDFS, etc.
The user who wrote the line about tracking SQL on Hadoop tools as a full
time job was right. As a Hadoop Architect who works for both Cloudera and
Hortonworks Professional Services, I always recommend to my clients to
build a spreadsheet with all the claims and verify those implemented by
each tool against the clients' needs and then having a POC "shoot off" in
house between the top 2 or 3 against real world workloads to see how the
tools perform against their own data. Never rely on salesman's or
developer's claims.
All of the tools are either open source or will be offered for POC by the
vendor. And if the choice of the tool will influence the distribution, eg
Impala -Cloudera vs. Hortonworks/Pivotal -Hawq, then perhaps the cart is
being placed in front of the horse. Are you willing to sacrifice more
mature management capability for more robust SQL performance? Do you need
mature Data Governance functions?
And the real kicker is that the Hadoop space is so dynamic that any design
and list of capabilities will be outdated in 6-12 months. Are you designing
for on-premises? Cloudera's strategic direction is cloud in 12-18 months.
Will kudu replace HBase for low-latency retrieval? Will the in-memory
paradigm of Spark replace MapReduce? SparkSQL and SparkR are both immature
and not ready for production, but in 12 months?
Choosing distributions and tools in the Hadoop space is complex and will be
for the foreseeable future.

Robert Marshall
Sr Hadoop Consultant
Avalon Consuting LLC
469-424-3449



On Friday, November 13, 2015, Dan Baskette <dbbaskette@gmail.com> wrote:

> Hive doesn't have the level of SQL support that HAWQ provides especially
> around sub-selects.   SparkSQL only support a subset of HiveQL, so the
> difference there is even bigger.
>
> Sent from my iPhone
>
> On Nov 13, 2015, at 9:39 AM, Biswas, Supriya <Supriya.Biswas@nielsen.com
> <javascript:_e(%7B%7D,'cvml','Supriya.Biswas@nielsen.com');>> wrote:
>
> Hello All –
>
>
>
> Hive 0.14 supports ACID and also supports transactions. Spark supports
> Hive queries (HQL).
>
>
>
> Did anyone compare HAWQ with spark SQL or Hive HQL on Spark?
>
>
>
> Thanks.
>
>
>
>
> *Supriyo Biswas *Architect – CPS Service Delivery
> The Nielsen Company
> Office (516) 682-6021/NETS 249-6021
>
> Cell     (516) 353-6795
> www.nielsen.com
>
>
>
> *From:* Atri Sharma [mailto:atri@apache.org
> <javascript:_e(%7B%7D,'cvml','atri@apache.org');>]
> *Sent:* Friday, November 13, 2015 3:53 AM
> *To:* user@hawq.incubator.apache.org
> <javascript:_e(%7B%7D,'cvml','user@hawq.incubator.apache.org');>
> *Subject:* Re: what is Hawq?
>
>
>
> Greenplum is open sourced.
>
> The main difference is between the two engines is that HAWQ is more for
> Hadoop based systems whereas Greenplum is more towards regular FS. This is
> a very high level difference between the two, the differences are more
> detailed. But a single line difference between the two is the one I wrote.
>
> On 13 Nov 2015 14:20, "Adaryl "Bob" Wakefield, MBA" <
> adaryl.wakefield@hotmail.com
> <javascript:_e(%7B%7D,'cvml','adaryl.wakefield@hotmail.com');>> wrote:
>
> Is Greenplum free? I heard they open sourced it but I haven’t found
> anything but a community edition.
>
>
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
>
>
> *From:* dortmont <javascript:_e(%7B%7D,'cvml','dortmont@gmail.com');>
>
> *Sent:* Friday, November 13, 2015 2:42 AM
>
> *To:* user@hawq.incubator.apache.org
> <javascript:_e(%7B%7D,'cvml','user@hawq.incubator.apache.org');>
>
> *Subject:* Re: what is Hawq?
>
>
>
> I see the advantage of HAWQ compared to other Hadoop SQL engines. It looks
> like the most mature solution on Hadoop thanks to the postgresql based
> engine.
>
>
>
> But why wouldn't I use Greenplum instead of HAWQ? It has even better
> performance and it supports updates.
>
>
> Cheers
>
>
>
> 2015-11-13 7:45 GMT+01:00 Atri Sharma <atri@apache.org
> <javascript:_e(%7B%7D,'cvml','atri@apache.org');>>:
>
> +1 for transactions.
>
> I think a major plus point is that HAWQ supports transactions,  and this
> enables a lot of critical workloads to be done on HAWQ.
>
> On 13 Nov 2015 12:13, "Lei Chang" <chang.lei.cn@gmail.com
> <javascript:_e(%7B%7D,'cvml','chang.lei.cn@gmail.com');>> wrote:
>
>
>
> Like what Bob said, HAWQ is a complete database and Drill is just a query
> engine.
>
>
>
> And HAWQ has also a lot of other benefits over Drill, for example:
>
>
>
> 1. SQL completeness: HAWQ is the best for the sql-on-hadoop engines, can
> run all TPCDS queries without any changes. And support almost all third
> party tools, such as Tableau et al.
>
> 2. Performance: proved the best in the hadoop world
>
> 3. Scalability: high scalable via high speed UDP based interconnect.
>
> 4. Transactions: as I know, drill does not support transactions. it is a
> nightmare for end users to keep consistency.
>
> 5. Advanced resource management: HAWQ has the most advanced resource
> management. It natively supports YARN and easy to use hierarchical resource
> queues. Resources can be managed and enforced on query and operator level.
>
>
>
> Cheers
>
> Lei
>
>
>
>
>
> On Fri, Nov 13, 2015 at 9:34 AM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefield@hotmail.com
> <javascript:_e(%7B%7D,'cvml','adaryl.wakefield@hotmail.com');>> wrote:
>
> There are a lot of tools that do a lot of things. Believe me it’s a full
> time job keeping track of what is going on in the apache world. As I
> understand it, Drill is just a query engine while Hawq is an actual
> database...some what anyway.
>
>
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics, LLC
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
>
>
> *From:* Will Wagner <javascript:_e(%7B%7D,'cvml','wowagner@gmail.com');>
>
> *Sent:* Thursday, November 12, 2015 7:42 AM
>
> *To:* user@hawq.incubator.apache.org
> <javascript:_e(%7B%7D,'cvml','user@hawq.incubator.apache.org');>
>
> *Subject:* Re: what is Hawq?
>
>
>
> Hi Lie,
>
> Great answer.
>
> I have a follow up question.
> Everything HAWQ is capable of doing is already covered by Apache Drill.
> Why do we need another tool?
>
> Thank you,
> Will W
>
> On Nov 12, 2015 12:25 AM, "Lei Chang" <chang.lei.cn@gmail.com
> <javascript:_e(%7B%7D,'cvml','chang.lei.cn@gmail.com');>> wrote:
>
>
>
> Hi Bob,
>
>
>
> Apache HAWQ is a Hadoop native SQL query engine that combines the key
> technological advantages of MPP database with the scalability and
> convenience of Hadoop. HAWQ reads data from and writes data to HDFS
> natively. HAWQ delivers industry-leading performance and linear
> scalability. It provides users the tools to confidently and successfully
> interact with petabyte range data sets. HAWQ provides users with a
> complete, standards compliant SQL interface. More specifically, HAWQ has
> the following features:
>
> ·         On-premise or cloud deployment
>
> ·         Robust ANSI SQL compliance: SQL-92, SQL-99, SQL-2003, OLAP
> extension
>
> ·         Extremely high performance. many times faster than other Hadoop
> SQL engine.
>
> ·         World-class parallel optimizer
>
> ·         Full transaction capability and consistency guarantee: ACID
>
> ·         Dynamic data flow engine through high speed UDP based
> interconnect
>
> ·         Elastic execution engine based on virtual segment & data
> locality
>
> ·         Support multiple level partitioning and List/Range based
> partitioned tables.
>
> ·         Multiple compression method support: snappy, gzip, quicklz, RLE
>
> ·         Multi-language user defined function support: python, perl,
> java, c/c++, R
>
> ·         Advanced machine learning and data mining functionalities
> through MADLib
>
> ·         Dynamic node expansion: in seconds
>
> ·         Most advanced three level resource management: Integrate with
> YARN and hierarchical resource queues.
>
> ·         Easy access of all HDFS data and external system data (for
> example, HBase)
>
> ·         Hadoop Native: from storage (HDFS), resource management (YARN)
> to deployment (Ambari).
>
> ·         Authentication & Granular authorization: Kerberos, SSL and role
> based access
>
> ·         Advanced C/C++ access library to HDFS and YARN: libhdfs3 &
> libYARN
>
> ·         Support most third party tools: Tableau, SAS et al.
>
> ·         Standard connectivity: JDBC/ODBC
>
>
>
> And the link here can give you more information around hawq:
> https://cwiki.apache.org/confluence/display/HAWQ/About+HAWQ
>
>
>
>
>
> And please also see the answers inline to your specific questions:
>
>
>
> On Thu, Nov 12, 2015 at 4:09 PM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefield@hotmail.com
> <javascript:_e(%7B%7D,'cvml','adaryl.wakefield@hotmail.com');>> wrote:
>
> Silly question right? Thing is I’ve read a bit and watched some YouTube
> videos and I’m still not quite sure what I can and can’t do with Hawq. Is
> it a true database or is it like Hive where I need to use HCatalog?
>
>
>
> It is a true database, you can think it is like a parallel postgres but
> with much more functionalities and it works natively in hadoop world.
> HCatalog is not necessary. But you can read data registered in HCatalog
> with the new feature "hcatalog integration".
>
>
>
> Can I write data intensive applications against it using ODBC? Does it
> enforce referential integrity? Does it have stored procedures?
>
>
>
> ODBC: yes, both JDBC/ODBC are supported
>
> referential integrity: currently not supported.
>
> Stored procedures: yes.
>
>
>
> B.
>
>
>
>
>
> Please let us know if you have any other questions.
>
>
>
> Cheers
>
> Lei
>
>
>
>
>
>
>
>
>
>

-- 
Robert L Marshall
Senior Consultant | Avalon Consulting, LLC
<http://www.avalonconsult.com/>c: (210) 853-7041
LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
<http://www.google.com/+AvalonConsultingLLC> | Twitter
<https://twitter.com/avalonconsult>
-------------------------------------------------------------------------------------------------------------
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected by law. If
you are not the intended recipient, you should delete this message. Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, is strictly prohibited.

Mime
View raw message