arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Miguel Duarte <duarte.gelvez.pedromig...@gmail.com>
Subject Re: Comparing with Parquet
Date Fri, 26 Feb 2016 01:11:53 GMT
I was wondering if someone could also elaborate in the comparison with
Tachyon (now called Alluxio)
On Feb 25, 2016 5:08 PM, "Chenliang (Liang, DataSight)" <
chenliang613@huawei.com> wrote:

> In favor of Henry Robinson's points.
>
> In addition. Arrow is suitable for exchanging data high efficiently, but
> the data size may just support TB level. Parquet can support more bigger
> data, but the performance couldn't support fast query.
>
> So for PB level data and interactively query(second level), both couldn't
> solve?
>
> Regards
> Liang
> -----邮件原件-----
> 发件人: Henry Robinson [mailto:henry@cloudera.com]
> 发送时间: 2016年2月26日 0:20
> 收件人: dev@arrow.apache.org
> 主题: Re: Comparing with Parquet
>
> Think of Parquet as a format well-suited to writing very large datasets to
> disk, whereas Arrow is a format most suited to efficient storage in memory.
> You might read Parquet files from disk, and then materialize them in memory
> in Arrow's format.
>
> Both formats are designed around the idiosyncrasies of the target medium:
> Parquet is not designed to support efficient random access because disks
> aren't good at that, but Arrow has fast random access  as a core design
> principle, to give just one example.
>
> Henry
>
> > On Feb 25, 2016, at 8:10 AM, Sourav Mazumder <
> sourav.mazumder00@gmail.com> wrote:
> >
> > Hi All,
> >
> > New to this. And still trying to figure out where exactly Arrow fits
> > in the ecosystem of various Big Data technologies.
> >
> > In that respect first thing which came to my mind is how does Arrow
> > compare with parquet.
> >
> > In my understanding Parquet also supports a very efficient columnar
> > format (with support for nested structure). It is already embraced
> > (supported) by various technologies like Impala (origin), Spark, Drill
> etc.
> >
> > The only think I see missing in Parquet is support for SIMD based
> > vectorized operations.
> >
> > Am I right or am I missing many other differences between Arrow and
> > parquet ?
> >
> > Regards,
> > Sourav
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message