From dev-return-10880-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Thu Mar 7 06:50:15 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 1F712180654 for ; Thu, 7 Mar 2019 07:50:14 +0100 (CET) Received: (qmail 46277 invoked by uid 500); 7 Mar 2019 06:50:09 -0000 Mailing-List: contact dev-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@arrow.apache.org Delivered-To: mailing list dev@arrow.apache.org Received: (qmail 46265 invoked by uid 99); 7 Mar 2019 06:50:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Mar 2019 06:50:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id CB02A1805EF for ; Thu, 7 Mar 2019 06:50:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.799 X-Spam-Level: * X-Spam-Status: No, score=1.799 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id bTK-b7uw18I5 for ; Thu, 7 Mar 2019 06:50:06 +0000 (UTC) Received: from mail-ua1-f49.google.com (mail-ua1-f49.google.com [209.85.222.49]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 1202C5F4D5 for ; Thu, 7 Mar 2019 06:50:06 +0000 (UTC) Received: by mail-ua1-f49.google.com with SMTP id g1so9835998uae.10 for ; Wed, 06 Mar 2019 22:50:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:reply-to:from:date:message-id :subject:to; bh=nFevI/pq0TvGedLREElND+gxNrRsMgNFkQPvJe1UWf8=; b=t+bUAS0YYtn75ouctmvNRpkUtf+aWqBt09/W3RAopXwzVw3yssFEdm1WbIqaUFV4ym JN5yuAq4bASSCjX0bhdU6uAvbQ5Z1ONG3ZFio1p42c6D381uimWDSyk5rpnmB2AM17UT KqOKIanV1J4FHDMsyzlo0+eJOCV2poPHTwBjcyLpzoVd7g98pXiKI9RKnhs/YCbfwYfq zgK01qN6am4x2lWWCvPZzOs/3j0RFSfO+XwzYKdMxxfmZ0FwjpigIqJWxyOHYFJNtwUR bIgysKEKUXzhkMQ1bUfAgxJCevuHG9WoyUTp8+YlsLgpIilDZhS2PJ83weLjj3TwUYLH OhGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to; bh=nFevI/pq0TvGedLREElND+gxNrRsMgNFkQPvJe1UWf8=; b=BHnfsiW0PQr063VUfhcTxdh2w2dWIb107hxeM/mj8y3cpZAtwd9BZr5DhZZHH3C/VY 4NzaWFoT2SjVtZQ0MvmWLeykGYrCiHiUAgVrWAzmDC+pfYj5Q2iHHr8Wkc70sEuPZK60 av09bhvX8RUgyIGs5Tpt1P5+aJhYxfSKsWSvOm91aiiBFYIqX4BzB7A8PDpth/UDlji2 r9juPNvD8M7l0Yl03dlq8YusiaoxmeDTG0/K02TO7JX0Q8h+Z6vTi/gndYieiugnZXzr 7zIzwU2zHlFX85UUCWqpb58thilpJ+lQAWhqgBJvcA/bstEO8RtIQlMjtx7yHYWuIoCh cZeQ== X-Gm-Message-State: APjAAAU6E1j8vMExH9Q5X1MF/VcgVrCHLuNzy5snzb8t8LadNFMLKfFQ 72v8LX69uxnDM3MoeJz2lAC3t9coQ35f3Aqy44qhjA== X-Google-Smtp-Source: APXvYqzRYNqpMx7T1QeA/q6nmx+03t0lmB4JNUc8khRQbgyomzsp68R7kraTTUgpBPfEABwT0FmvTmVn5WWkNHuv10Q= X-Received: by 2002:ab0:2a11:: with SMTP id o17mr5859054uar.29.1551941399111; Wed, 06 Mar 2019 22:49:59 -0800 (PST) MIME-Version: 1.0 References: <03486e77-765e-4686-98de-300d9eb5c208@www.fastmail.com> In-Reply-To: Reply-To: emkornfield@gmail.com From: Micah Kornfield Date: Wed, 6 Mar 2019 22:49:48 -0800 Message-ID: Subject: Re: Depending on non-released Apache projects (C++ Avro) To: dev@arrow.apache.org Content-Type: multipart/alternative; boundary="000000000000ce6b2705837b83d8" --000000000000ce6b2705837b83d8 Content-Type: text/plain; charset="UTF-8" Thanks for the input Wes and Uwe, given no one from the Avro community has chimed in, I will try to reach out on there dev mailing list. Uwe, I'm not sure I understand what type of support/help you are thinking of. Could you elaborate a little bit more before I reach out? -Micah On Tue, Mar 5, 2019 at 4:53 PM Wes McKinney wrote: > I am OK with that, but if we find ourselves making compromises that > affect performance or memory efficiency (where possibly invasive > refactoring may be required) perhaps we should reconsider option #3. > > On Tue, Mar 5, 2019 at 11:29 AM Uwe L. Korn wrote: > > > > I'm leaning a bit towards 1) but I would love to get some input from the > Avro community as 1) depends also on their side as we will submit some > patches upstream that need to be reviewed and someday also released. > > > > Are AVRO committers subscribed here or should we reach out to them on > their ML? Given that we are quite active in the C++ space currently, I feel > that we can contribute quite some infrastructure in building and packaging > that we do eitherway for Arrow. This might be quite helpful for a project. > We have seen with Parquet where much of the development is just happening > as it is part of Arrow. (Not suggesting to merge/fork the Avro codebase but > just to apply some of the best practices we learned while building Arrow). > > > > Uwe > > > > On Tue, Mar 5, 2019, at 4:57 PM, Wes McKinney wrote: > > > I'd be +0.5 in favor of forking in this particular case. Since Avro is > > > not vectorized (unlike Parquet and ORC) I suspect it may be more > > > difficult to get the best performance using a general purpose API > > > versus one that is more specialized to producing Arrow record batches. > > > Given that has been relatively light C++ development activity in > > > Apache Avro and no releases for 2 years it does give me pause. > > > > > > We might want to look at Impala's Avro scanner, they are doing some > > > LLVM IR cross-compilation also (they're using the Avro C++ library > > > though) > > > > > > > https://github.com/apache/impala/blob/master/be/src/exec/hdfs-avro-scanner-ir.cc > > > > https://github.com/apache/impala/blob/master/be/src/exec/hdfs-avro-scanner.cc > > > > > > On Tue, Mar 5, 2019 at 1:01 AM Micah Kornfield > wrote: > > > > > > > > I'm looking at incorporating Avro in Arrow C++ [1]. It seems that > the Avro > > > > C++ library APIs have improved from the last release. However, it > is not > > > > clear when a new release will be available (I asked on the JIRA > Item for > > > > the next release [2] and received no response). > > > > > > > > I was wondering if there is a policy governing using other Apache > projects > > > > or how people felt about the following options: > > > > 1. Depend on a specific git commit through the third-party library > system. > > > > 2. Copy the necessary source code temporarily to our project, and > change > > > > to using the next release when it is available. > > > > 3. Fork the code we need (the main benefit I see here is being able > to > > > > refactor it to avoid having to deal with exceptions, easier > integration > > > > with our IO system and one less 3rd party dependency to deal with). > > > > 4. Wait on the 1.9 release before proceeding. > > > > > > > > Thanks, > > > > Micah > > > > > > > > [1] https://issues.apache.org/jira/browse/ARROW-1209 > > > > [2] https://issues.apache.org/jira/browse/AVRO-2250 > > > > --000000000000ce6b2705837b83d8--