hawq-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Baskette <dbbaske...@gmail.com>
Subject Re: what is Hawq?
Date Sat, 14 Nov 2015 00:45:42 GMT
No, truncate was added to Apache Hadoop

https://issues.apache.org/jira/plugins/servlet/mobile#issue/hdfs-3107

Sent from my iPhone

> On Nov 13, 2015, at 7:39 PM, Bob Marshall <marshallb@avalonconsult.com> wrote:
> 
> I stand corrected. But I had a question:
> 
> In Pivotal Hadoop HDFS, we added truncate to support transaction. The signature of the
truncate is as follows. void truncate(Path src, long length) throws IOException; The truncate()
function truncates the file to the size which is less or equal to the file length. Ift he
size of the file is smaller than the target length, an IOException is thrown.This is different
from Posix truncate semantics. The rationale behind is HDFS does not support overwriting at
any position.
> 
> Does this mean I need to run a modified HDFS to run HAWQ?
> 
> Robert L Marshall
> Senior Consultant | Avalon Consulting, LLC
> c: (210) 853-7041
> LinkedIn | Google+ | Twitter
> -------------------------------------------------------------------------------------------------------------
> This message (including any attachments) contains confidential information 
> intended for a specific individual and purpose, and is protected by law. If 
> you are not the intended recipient, you should delete this message. Any 
> disclosure, copying, or distribution of this message, or the taking of any 
> action based on it, is strictly prohibited.
> 
>> On Fri, Nov 13, 2015 at 7:16 PM, Dan Baskette <dbbaskette@gmail.com> wrote:
>> But HAWQ does manage its own storage on HDFS.  You can leverage native hawq format
or Parquet.  It's PXF functions allows the querying of files in other formats.   So, by your
(and my) definition it is indeed a database.  
>> 
>> Sent from my iPhone
>> 
>>> On Nov 13, 2015, at 7:08 PM, Bob Marshall <marshallb@avalonconsult.com>
wrote:
>>> 
>>> Chhavi Joshi is right on the money. A database is both a query execution tool
and a data storage backend. HAWQ is executing against native Hadoop storage, i.e. HBase, HDFS,
etc.
>>> 
>>> Robert L Marshall
>>> Senior Consultant | Avalon Consulting, LLC
>>> c: (210) 853-7041
>>> LinkedIn | Google+ | Twitter
>>> -------------------------------------------------------------------------------------------------------------
>>> This message (including any attachments) contains confidential information 
>>> intended for a specific individual and purpose, and is protected by law. If 
>>> you are not the intended recipient, you should delete this message. Any 
>>> disclosure, copying, or distribution of this message, or the taking of any 
>>> action based on it, is strictly prohibited.
>>> 
>>>> On Fri, Nov 13, 2015 at 10:41 AM, Chhavi Joshi <Chhavi.Joshi@techmahindra.com>
wrote:
>>>> If you have HAWQ greenplum integration you can create the external tables
in greenplum like HIVE.
>>>> 
>>>> For uploading the data into tables just need to put the file into hdfs.(same
like external tables in HIVE)
>>>> 
>>>>  
>>>> 
>>>>  
>>>> 
>>>> I still believe HAWQ is only the SQL query engine not a database.
>>>> 
>>>>  
>>>> 
>>>> Chhavi
>>>> 
>>>> From: Atri Sharma [mailto:atri@apache.org] 
>>>> Sent: Friday, November 13, 2015 3:53 AM
>>>> 
>>>> 
>>>> To: user@hawq.incubator.apache.org
>>>> Subject: Re: what is Hawq?
>>>>  
>>>> 
>>>> Greenplum is open sourced.
>>>> 
>>>> The main difference is between the two engines is that HAWQ is more for Hadoop
based systems whereas Greenplum is more towards regular FS. This is a very high level difference
between the two, the differences are more detailed. But a single line difference between the
two is the one I wrote.
>>>> 
>>>> On 13 Nov 2015 14:20, "Adaryl "Bob" Wakefield, MBA" <adaryl.wakefield@hotmail.com>
wrote:
>>>> 
>>>> Is Greenplum free? I heard they open sourced it but I haven’t found anything
but a community edition.
>>>> 
>>>>  
>>>> 
>>>> Adaryl "Bob" Wakefield, MBA
>>>> Principal
>>>> Mass Street Analytics, LLC
>>>> 913.938.6685
>>>> www.linkedin.com/in/bobwakefieldmba
>>>> Twitter: @BobLovesData
>>>> 
>>>>  
>>>> 
>>>> From: dortmont
>>>> 
>>>> Sent: Friday, November 13, 2015 2:42 AM
>>>> 
>>>> To: user@hawq.incubator.apache.org
>>>> 
>>>> Subject: Re: what is Hawq?
>>>> 
>>>>  
>>>> 
>>>> I see the advantage of HAWQ compared to other Hadoop SQL engines. It looks
like the most mature solution on Hadoop thanks to the postgresql based engine.
>>>> 
>>>>  
>>>> 
>>>> But why wouldn't I use Greenplum instead of HAWQ? It has even better performance
and it supports updates.
>>>> 
>>>> 
>>>> Cheers
>>>> 
>>>>  
>>>> 
>>>> 2015-11-13 7:45 GMT+01:00 Atri Sharma <atri@apache.org>:
>>>> 
>>>> +1 for transactions.
>>>> 
>>>> I think a major plus point is that HAWQ supports transactions,  and this
enables a lot of critical workloads to be done on HAWQ.
>>>> 
>>>> On 13 Nov 2015 12:13, "Lei Chang" <chang.lei.cn@gmail.com> wrote:
>>>> 
>>>>  
>>>> 
>>>> Like what Bob said, HAWQ is a complete database and Drill is just a query
engine.
>>>> 
>>>>  
>>>> 
>>>> And HAWQ has also a lot of other benefits over Drill, for example:
>>>> 
>>>>  
>>>> 
>>>> 1. SQL completeness: HAWQ is the best for the sql-on-hadoop engines, can
run all TPCDS queries without any changes. And support almost all third party tools, such
as Tableau et al.
>>>> 
>>>> 2. Performance: proved the best in the hadoop world
>>>> 
>>>> 3. Scalability: high scalable via high speed UDP based interconnect.
>>>> 
>>>> 4. Transactions: as I know, drill does not support transactions. it is a
nightmare for end users to keep consistency.
>>>> 
>>>> 5. Advanced resource management: HAWQ has the most advanced resource management.
It natively supports YARN and easy to use hierarchical resource queues. Resources can be managed
and enforced on query and operator level.
>>>> 
>>>>  
>>>> 
>>>> Cheers
>>>> 
>>>> Lei
>>>> 
>>>>  
>>>> 
>>>>  
>>>> 
>>>> On Fri, Nov 13, 2015 at 9:34 AM, Adaryl "Bob" Wakefield, MBA <adaryl.wakefield@hotmail.com>
wrote:
>>>> 
>>>> There are a lot of tools that do a lot of things. Believe me it’s a full
time job keeping track of what is going on in the apache world. As I understand it, Drill
is just a query engine while Hawq is an actual database...some what anyway.
>>>> 
>>>>  
>>>> 
>>>> Adaryl "Bob" Wakefield, MBA
>>>> Principal
>>>> Mass Street Analytics, LLC
>>>> 913.938.6685
>>>> www.linkedin.com/in/bobwakefieldmba
>>>> Twitter: @BobLovesData
>>>> 
>>>>  
>>>> 
>>>> From: Will Wagner
>>>> 
>>>> Sent: Thursday, November 12, 2015 7:42 AM
>>>> 
>>>> To: user@hawq.incubator.apache.org
>>>> 
>>>> Subject: Re: what is Hawq?
>>>> 
>>>>  
>>>> 
>>>> Hi Lie,
>>>> 
>>>> Great answer.
>>>> 
>>>> I have a follow up question. 
>>>> Everything HAWQ is capable of doing is already covered by Apache Drill. 
Why do we need another tool?
>>>> 
>>>> Thank you, 
>>>> Will W
>>>> 
>>>> On Nov 12, 2015 12:25 AM, "Lei Chang" <chang.lei.cn@gmail.com> wrote:
>>>> 
>>>>  
>>>> 
>>>> Hi Bob,
>>>> 
>>>>  
>>>> 
>>>> Apache HAWQ is a Hadoop native SQL query engine that combines the key technological
advantages of MPP database with the scalability and convenience of Hadoop. HAWQ reads data
from and writes data to HDFS natively. HAWQ delivers industry-leading performance and linear
scalability. It provides users the tools to confidently and successfully interact with petabyte
range data sets. HAWQ provides users with a complete, standards compliant SQL interface. More
specifically, HAWQ has the following features:
>>>> ·         On-premise or cloud deployment
>>>> 
>>>> ·         Robust ANSI SQL compliance: SQL-92, SQL-99, SQL-2003, OLAP extension
>>>> 
>>>> ·         Extremely high performance. many times faster than other Hadoop
SQL engine.
>>>> 
>>>> ·         World-class parallel optimizer
>>>> 
>>>> ·         Full transaction capability and consistency guarantee: ACID
>>>> 
>>>> ·         Dynamic data flow engine through high speed UDP based interconnect
>>>> 
>>>> ·         Elastic execution engine based on virtual segment & data locality
>>>> 
>>>> ·         Support multiple level partitioning and List/Range based partitioned
tables.
>>>> 
>>>> ·         Multiple compression method support: snappy, gzip, quicklz, RLE
>>>> 
>>>> ·         Multi-language user defined function support: python, perl, java,
c/c++, R
>>>> 
>>>> ·         Advanced machine learning and data mining functionalities through
MADLib
>>>> 
>>>> ·         Dynamic node expansion: in seconds
>>>> 
>>>> ·         Most advanced three level resource management: Integrate with
YARN and hierarchical resource queues.
>>>> 
>>>> ·         Easy access of all HDFS data and external system data (for example,
HBase)
>>>> 
>>>> ·         Hadoop Native: from storage (HDFS), resource management (YARN)
to deployment (Ambari).
>>>> 
>>>> ·         Authentication & Granular authorization: Kerberos, SSL and
role based access
>>>> 
>>>> ·         Advanced C/C++ access library to HDFS and YARN: libhdfs3 &
libYARN
>>>> 
>>>> ·         Support most third party tools: Tableau, SAS et al.
>>>> 
>>>> ·         Standard connectivity: JDBC/ODBC
>>>> 
>>>>  
>>>> 
>>>> And the link here can give you more information around hawq: https://cwiki.apache.org/confluence/display/HAWQ/About+HAWQ

>>>> 
>>>>  
>>>> 
>>>>  
>>>> 
>>>> And please also see the answers inline to your specific questions:
>>>> 
>>>>  
>>>> 
>>>> On Thu, Nov 12, 2015 at 4:09 PM, Adaryl "Bob" Wakefield, MBA <adaryl.wakefield@hotmail.com>
wrote:
>>>> 
>>>> Silly question right? Thing is I’ve read a bit and watched some YouTube
videos and I’m still not quite sure what I can and can’t do with Hawq. Is it a true database
or is it like Hive where I need to use HCatalog?
>>>> 
>>>>  
>>>> 
>>>> It is a true database, you can think it is like a parallel postgres but with
much more functionalities and it works natively in hadoop world. HCatalog is not necessary.
But you can read data registered in HCatalog with the new feature "hcatalog integration".
>>>> 
>>>>  
>>>> 
>>>> Can I write data intensive applications against it using ODBC? Does it enforce
referential integrity? Does it have stored procedures?
>>>> 
>>>>  
>>>> 
>>>> ODBC: yes, both JDBC/ODBC are supported
>>>> 
>>>> referential integrity: currently not supported.
>>>> 
>>>> Stored procedures: yes.
>>>> 
>>>>  
>>>> 
>>>> B.
>>>> 
>>>>  
>>>> 
>>>>  
>>>> 
>>>> Please let us know if you have any other questions.
>>>> 
>>>>  
>>>> 
>>>> Cheers
>>>> 
>>>> Lei
>>>> 
>>>>  
>>>> 
>>>>  
>>>> 
>>>>  
>>>> 
>>>>  
>>>> 
>>>> ============================================================================================================================
>>>> Disclaimer: This message and the information contained herein is proprietary
and confidential and subject to the Tech Mahindra policy statement, you may review the policy
at http://www.techmahindra.com/Disclaimer.html externally http://tim.techmahindra.com/tim/disclaimer.html
internally within TechMahindra.
>>>> ============================================================================================================================
> 

Mime
View raw message