Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 15195200B89 for ; Wed, 7 Sep 2016 06:51:29 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 13DA8160ACE; Wed, 7 Sep 2016 04:51:29 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 598CA160AA9 for ; Wed, 7 Sep 2016 06:51:28 +0200 (CEST) Received: (qmail 75444 invoked by uid 500); 7 Sep 2016 04:51:27 -0000 Mailing-List: contact dev-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@arrow.apache.org Delivered-To: mailing list dev@arrow.apache.org Received: (qmail 75433 invoked by uid 99); 7 Sep 2016 04:51:27 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Sep 2016 04:51:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 9845CC06E9 for ; Wed, 7 Sep 2016 04:51:26 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.426 X-Spam-Level: X-Spam-Status: No, score=-0.426 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-1.426] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id jCIf4D1DgIk3 for ; Wed, 7 Sep 2016 04:51:23 +0000 (UTC) Received: from butterblume.xhochy.com (butterblume.xhochy.com [176.9.182.229]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 7A6375F369 for ; Wed, 7 Sep 2016 04:51:23 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by butterblume.xhochy.com (Postfix) with ESMTP id 4F9EC281F8F; Wed, 7 Sep 2016 06:51:17 +0200 (CEST) Received: from butterblume.xhochy.com ([127.0.0.1]) by localhost (butterblume.xhochy.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gZs5rrx9JBUY; Wed, 7 Sep 2016 06:51:16 +0200 (CEST) Received: from mac-052.local (HSI-KBW-046-005-002-215.hsi8.kabel-badenwuerttemberg.de [46.5.2.215]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: uwelk@xhochy.com) by butterblume.xhochy.com (Postfix) with ESMTPSA id 4A374280CE5; Wed, 7 Sep 2016 06:51:16 +0200 (CEST) Subject: Re: Arrow-Parquet integration location (Was: Arrow cpp travis-ci build broken) To: dev@parquet.apache.org, dev@arrow.apache.org References: From: Uwe Korn Message-ID: <27f2ce48-b50b-d3bd-e8a8-c6112cc8ab20@xhochy.com> Date: Wed, 7 Sep 2016 06:51:15 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit archived-at: Wed, 07 Sep 2016 04:51:29 -0000 Hello, I'm also in favour of switching the dependency direction between Parquet and Arrow as this would avoid a lot of duplicate code in both projects as well as parquet-cpp profiting from functionality that is available in Arrow. @wesm: go ahead with the JIRAs and I'll add comments or will pick some of them up. Cheers Uwe On 07.09.16 04:41, Wes McKinney wrote: > hi Julien, > > It makes sense to move the Parquet support for Arrow into Parquet > itself and invert the dependency. I had thought that the coupling to > Arrow C++'s IO subsystem might be tighter, but the connection between > memory allocators and file abstractions is fairly simple: > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/parquet/io.h > > I'll open appropriate JIRAs and Uwe and I can coordinate on the refactoring. > > The exposure of the Parquet functionality in Python should stay inside > Arrow for now, but mainly because it would make developing the Python > side of things much more difficult if we split things up right now. > > - Wes > > On Tue, Sep 6, 2016 at 8:27 PM, Brian Bowman wrote: >> Forgive me if interposing my first post for the Apache Arrow project on this thread is incorrect procedure. >> >> What Julien proposes with each storage layer producing Arrow Record Batches is exactly how I envision it working and would certainly make Arrow integration with SAS much more palatable. This is likely true for other storage layer providers as well. >> >> Brian Bowman (SAS) >> >>> On Sep 6, 2016, at 7:52 PM, Julien Le Dem wrote: >>> >>> Thanks Wes, >>> No worries, I know you are on top of those things. >>> On a side note, I was wondering if the arrow-parquet integration should be >>> in Parquet instead. >>> Parquet would depend on Arrow and not the other way around. >>> Arrow provides the API and each storage layer (Parquet, Kudu, Cassandra, >>> ...) provides a way to produce Arrow Record Batches. >>> thoughts? >>> >>>> On Tue, Sep 6, 2016 at 3:37 PM, Wes McKinney wrote: >>>> >>>> hi Julien, >>>> >>>> I'm very sorry about the inconvenience with this and the delay in >>>> getting it sorted out. I will triage this evening by disabling the >>>> Parquet tests in Arrow until we get the current problems under >>>> control. When we re-enable the Parquet tests in Travis CI I agree we >>>> should pin the version SHA. >>>> >>>> - Wes >>>> >>>>> On Tue, Sep 6, 2016 at 5:30 PM, Julien Le Dem wrote: >>>>> The Arrow cpp travis-ci build is broken right now because it depends on >>>>> parquet-cpp which has changed in an incompatible way. [1] [2] (or so it >>>>> looks to me) >>>>> Since parquet-cpp is not released yet it is totally fine to make >>>>> incompatible API changes. >>>>> However, we may want to pin the Arrow to Parquet dependency (on a git >>>> sha?) >>>>> to prevent cross project changes from breaking the master build. >>>>> Since I'm not one of the core cpp dev on those projects I mainly want to >>>>> start that conversation rather than prescribe a solution. Feel free to >>>> take >>>>> this as a straw man and suggest something else. >>>>> >>>>> [1] https://travis-ci.org/apache/arrow/jobs/156080555 >>>>> [2] >>>>> https://github.com/apache/arrow/blob/2d8ec789365f3c0f82b1f22d76160d >>>> 5af150dd31/ci/travis_before_script_cpp.sh >>>>> >>>>> -- >>>>> Julien >>> >>> >>> -- >>> Julien