arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Le Dem <jul...@dremio.com>
Subject Re: Arrow-Parquet integration location (Was: Arrow cpp travis-ci build broken)
Date Wed, 07 Sep 2016 21:30:11 GMT
@Wes, Uwe: Thank you!

@Brian: no procedure required :) Thanks for your feedback.
We're happy to hear more about SAS integration. Feel free to send a blurb
to the list.

On Tue, Sep 6, 2016 at 9:51 PM, Uwe Korn <uwelk@xhochy.com> wrote:

> Hello,
>
> I'm also in favour of switching the dependency direction between Parquet
> and Arrow as this would avoid a lot of duplicate code in both projects as
> well as parquet-cpp profiting from functionality that is available in Arrow.
>
> @wesm: go ahead with the JIRAs and I'll add comments or will pick some of
> them up.
>
> Cheers
>
> Uwe
>
>
>
> On 07.09.16 04:41, Wes McKinney wrote:
>
>> hi Julien,
>>
>> It makes sense to move the Parquet support for Arrow into Parquet
>> itself and invert the dependency. I had thought that the coupling to
>> Arrow C++'s IO subsystem might be tighter, but the connection between
>> memory allocators and file abstractions is fairly simple:
>>
>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/parquet/io.h
>>
>> I'll open appropriate JIRAs and Uwe and I can coordinate on the
>> refactoring.
>>
>> The exposure of the Parquet functionality in Python should stay inside
>> Arrow for now, but mainly because it would make developing the Python
>> side of things much more difficult if we split things up right now.
>>
>> - Wes
>>
>> On Tue, Sep 6, 2016 at 8:27 PM, Brian Bowman <Brian.Bowman@sas.com>
>> wrote:
>>
>>> Forgive me if interposing my first post for the Apache Arrow project on
>>> this thread is incorrect procedure.
>>>
>>> What Julien proposes with each storage layer producing Arrow Record
>>> Batches is exactly how I envision it working and would certainly make Arrow
>>> integration with SAS much more palatable.  This is likely true for other
>>> storage layer providers as well.
>>>
>>> Brian Bowman (SAS)
>>>
>>> On Sep 6, 2016, at 7:52 PM, Julien Le Dem <julien@dremio.com> wrote:
>>>>
>>>> Thanks Wes,
>>>> No worries, I know you are on top of those things.
>>>> On a side note, I was wondering if the arrow-parquet integration should
>>>> be
>>>> in Parquet instead.
>>>> Parquet would depend on Arrow and not the other way around.
>>>> Arrow provides the API and each storage layer (Parquet, Kudu, Cassandra,
>>>> ...) provides a way to produce Arrow Record Batches.
>>>> thoughts?
>>>>
>>>> On Tue, Sep 6, 2016 at 3:37 PM, Wes McKinney <wesmckinn@gmail.com>
>>>>> wrote:
>>>>>
>>>>> hi Julien,
>>>>>
>>>>> I'm very sorry about the inconvenience with this and the delay in
>>>>> getting it sorted out. I will triage this evening by disabling the
>>>>> Parquet tests in Arrow until we get the current problems under
>>>>> control. When we re-enable the Parquet tests in Travis CI I agree we
>>>>> should pin the version SHA.
>>>>>
>>>>> - Wes
>>>>>
>>>>> On Tue, Sep 6, 2016 at 5:30 PM, Julien Le Dem <julien@dremio.com>
>>>>>> wrote:
>>>>>> The Arrow cpp travis-ci build is broken right now because it depends
>>>>>> on
>>>>>> parquet-cpp which has changed in an incompatible way. [1] [2] (or
so
>>>>>> it
>>>>>> looks to me)
>>>>>> Since parquet-cpp is not released yet it is totally fine to make
>>>>>> incompatible API changes.
>>>>>> However, we may want to pin the Arrow to Parquet dependency (on a
git
>>>>>>
>>>>> sha?)
>>>>>
>>>>>> to prevent cross project changes from breaking the master build.
>>>>>> Since I'm not one of the core cpp dev on those projects I mainly
want
>>>>>> to
>>>>>> start that conversation rather than prescribe a solution. Feel free
to
>>>>>>
>>>>> take
>>>>>
>>>>>> this as a straw man and suggest something else.
>>>>>>
>>>>>> [1] https://travis-ci.org/apache/arrow/jobs/156080555
>>>>>> [2]
>>>>>> https://github.com/apache/arrow/blob/2d8ec789365f3c0f82b1f22d76160d
>>>>>>
>>>>> 5af150dd31/ci/travis_before_script_cpp.sh
>>>>>
>>>>>>
>>>>>> --
>>>>>> Julien
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Julien
>>>>
>>>
>


-- 
Julien

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message