hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lei Chang <lei_ch...@apache.org>
Subject Re: Support orc format
Date Tue, 21 Jun 2016 00:54:52 GMT
On Tue, Jun 21, 2016 at 8:38 AM, Roman Shaposhnik <roman@shaposhnik.org>
wrote:

> On Fri, Jun 17, 2016 at 3:02 AM, Ming Li <mli@pivotal.io> wrote:
> > Hi Guys,
> >
> > ORC (Optimized Row Columnar) is a very popular open source format adopted
> > in some major components in Hadoop eco-system. It is also used by a lot
> of
> > users. The advantages of supporting ORC storage in HAWQ are in two folds:
> > firstly, it makes HAWQ more Hadoop native which interacts with other
> > components more easily; secondly, ORC stores some meta info for query
> > optimization, thus, it might potentially outperform two native formats
> > (i.e., AO, Parquet) if it is available.
> >
> > Since there are lots of popular formats available in HDFS community, and
> > more advanced formats are emerging frequently. It is good option for HAWQ
> > to design a general framework that supports pluggable c/c++ formats such
> as
> > ORC, as well as native format such as AO and Parquet. In designing this
> > framework, we also need to support data stored in different file systems:
> > HDFS, local disk, amazon S3, etc. Thus, it is better to offer a framework
> > to support pluggable formats and pluggable file systems.
> >
> > We are proposing support ORC in JIRA (
> > https://issues.apache.org/jira/browse/HAWQ-786). Please see the design
> spec
> > in the JIRA.
> >
> > Your comments are appreciated!
>
> This sounds reasonable, but I'd like to understand the trade-offs
> between supporting
> something like ORC in PXF vs. implementing it natively in C/C++.
>
> Is there any hard performance/etc. data that you could share to
> illuminated the
> tradeoffs between these two approaches?
>

Implementing it natively in C/C++ will get at least comparable performance
with current native AO and parquet format.

And we know that ao and parquet is faster than pxf, so we are expecting
better performance here.

Cheers
Lei

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message