arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Kornfield <emkornfi...@gmail.com>
Subject Re: How Arrow is related to Parquet
Date Tue, 22 Mar 2016 18:15:06 GMT
>It's not 100% clear whether putting the adapter library in
> apache/parquet-cpp or apache/arrow makes more sense, if others have
> opinions.

Just an idea: It might be nice to start using
https://ci.apache.org/buildbot.html to build snapshots and then download
the snapshot instead of building from scratch each time, this could
potentially save a fair amount of build time. In the long term we could
separate out non-core components into individual smaller projects that take
dependencies on the core of each project, and avoid the debate on which way
the dependencies should be organized.

This might be able to help solve tracking performance of benchmark runs as
well.

On Mon, Mar 21, 2016 at 10:19 AM, Wes McKinney <wes@cloudera.com> wrote:

> hi Kai,
>
> Arrow C++ does not strictly depend on Parquet C++. I have been working
> on parquet-cpp (http://github.com/apache/parquet-cpp) and intend to
> create an optional Parquet-Arrow adapter library that links to
> libparquet.so and provides a read and write path for Arrow data. It's
> not 100% clear whether putting the adapter library in
> apache/parquet-cpp or apache/arrow makes more sense, if others have
> opinions.
>
> The downside of putting the Arrow-Parquet C++ adapter code in
> apache/arrow is continuous integration -- building Thrift and the
> other parquet-cpp dependencies to run the unit tests might become
> onerous. That being said, I *do* need to be able to run unit tests for
> the PyArrow Parquet read/write path. I'm starting work on this in the
> next few days, in fact.
>
> - Wes
>
> On Mon, Mar 21, 2016 at 8:57 AM, Zheng, Kai <kai.zheng@intel.com> wrote:
> > Hi,
> >
> > By quick looking at the codes, it looks like Arrow is depending on
> Parquet, however Parquet looks kinds of heavy for Arrow. Not sure what's
> the exact part in Parquet Arrow is using. Not sure if the vice versa is
> better or not, say in Parquet project, have a new reader that reads parquet
> data into an Arrow representation, let Parquet depend on Arrow instead. I
> noticed there was some effort (PARQUET-131) that reads parquet data into
> column vectors, wonder if it's the very similar thing needed for Arrow.
> >
> > Will we support other formats like ORC file as well? If so, how to
> handle the relationship similarly? Thanks.
> >
> > Regards,
> > Kai
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message