arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uwe Korn <>
Subject Re: Arrow-Parquet integration location (Was: Arrow cpp travis-ci build broken)
Date Wed, 07 Sep 2016 04:51:15 GMT

I'm also in favour of switching the dependency direction between Parquet 
and Arrow as this would avoid a lot of duplicate code in both projects 
as well as parquet-cpp profiting from functionality that is available in 

@wesm: go ahead with the JIRAs and I'll add comments or will pick some 
of them up.



On 07.09.16 04:41, Wes McKinney wrote:
> hi Julien,
> It makes sense to move the Parquet support for Arrow into Parquet
> itself and invert the dependency. I had thought that the coupling to
> Arrow C++'s IO subsystem might be tighter, but the connection between
> memory allocators and file abstractions is fairly simple:
> I'll open appropriate JIRAs and Uwe and I can coordinate on the refactoring.
> The exposure of the Parquet functionality in Python should stay inside
> Arrow for now, but mainly because it would make developing the Python
> side of things much more difficult if we split things up right now.
> - Wes
> On Tue, Sep 6, 2016 at 8:27 PM, Brian Bowman <> wrote:
>> Forgive me if interposing my first post for the Apache Arrow project on this thread
is incorrect procedure.
>> What Julien proposes with each storage layer producing Arrow Record Batches is exactly
how I envision it working and would certainly make Arrow integration with SAS much more palatable.
 This is likely true for other storage layer providers as well.
>> Brian Bowman (SAS)
>>> On Sep 6, 2016, at 7:52 PM, Julien Le Dem <> wrote:
>>> Thanks Wes,
>>> No worries, I know you are on top of those things.
>>> On a side note, I was wondering if the arrow-parquet integration should be
>>> in Parquet instead.
>>> Parquet would depend on Arrow and not the other way around.
>>> Arrow provides the API and each storage layer (Parquet, Kudu, Cassandra,
>>> ...) provides a way to produce Arrow Record Batches.
>>> thoughts?
>>>> On Tue, Sep 6, 2016 at 3:37 PM, Wes McKinney <>
>>>> hi Julien,
>>>> I'm very sorry about the inconvenience with this and the delay in
>>>> getting it sorted out. I will triage this evening by disabling the
>>>> Parquet tests in Arrow until we get the current problems under
>>>> control. When we re-enable the Parquet tests in Travis CI I agree we
>>>> should pin the version SHA.
>>>> - Wes
>>>>> On Tue, Sep 6, 2016 at 5:30 PM, Julien Le Dem <>
>>>>> The Arrow cpp travis-ci build is broken right now because it depends
>>>>> parquet-cpp which has changed in an incompatible way. [1] [2] (or so
>>>>> looks to me)
>>>>> Since parquet-cpp is not released yet it is totally fine to make
>>>>> incompatible API changes.
>>>>> However, we may want to pin the Arrow to Parquet dependency (on a git
>>>> sha?)
>>>>> to prevent cross project changes from breaking the master build.
>>>>> Since I'm not one of the core cpp dev on those projects I mainly want
>>>>> start that conversation rather than prescribe a solution. Feel free to
>>>> take
>>>>> this as a straw man and suggest something else.
>>>>> [1]
>>>>> [2]
>>>> 5af150dd31/ci/
>>>>> --
>>>>> Julien
>>> --
>>> Julien

View raw message