From user-return-519-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Mon Jun 29 21:25:19 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 63EB218065B for ; Mon, 29 Jun 2020 23:25:19 +0200 (CEST) Received: (qmail 51543 invoked by uid 500); 29 Jun 2020 21:25:18 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 51533 invoked by uid 99); 29 Jun 2020 21:25:18 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Jun 2020 21:25:18 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id CF7B0C070B for ; Mon, 29 Jun 2020 21:25:17 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.201 X-Spam-Level: X-Spam-Status: No, score=-0.201 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id kRdmPhDJ7vCl for ; Mon, 29 Jun 2020 21:25:15 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.166.41; helo=mail-io1-f41.google.com; envelope-from=wesmckinn@gmail.com; receiver= Received: from mail-io1-f41.google.com (mail-io1-f41.google.com [209.85.166.41]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 47732BB8FC for ; Mon, 29 Jun 2020 21:25:15 +0000 (UTC) Received: by mail-io1-f41.google.com with SMTP id f6so3172768ioj.5 for ; Mon, 29 Jun 2020 14:25:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=D5vjOcL3efy3H2z098hj0bw3i0qfY/P/iCnTbKICMtg=; b=aVnjGQ8TTTbhVvpxMvTfAwDb4XNPp/a/87av/eNsygyFn1UrW5IBN9gEA2i81fIEWY Q6NwbQ1oMe8wK5+0lUfwVh7P69pW8cO+aHfkCAqAai+8M80mPGD5fDII7C8G4MUXWmTU Pb2yrlFyValE/ud16JcwB4bdMNQLVvYCtUFOyptImQPusElflGb/iYMQdqf1gGo7wJFH 2NDE4J4bCa8i/m+9GSId/byiStaN3Ysea5jx0aynREaK68pQbOWzlJftrsOLJChHZMUb pKYU3b6x9PGFo+GwotYpaUs8L6AFzqXxauKV81Yx/dwYOTwp1NCuQHGTo0RCSmymyO0L UP3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=D5vjOcL3efy3H2z098hj0bw3i0qfY/P/iCnTbKICMtg=; b=syha7W47ucVliUWjW+jF4cBKYzB7L3KfxF87zTE/401F0fuIU9o/gTmEsfu1KbSugY 1oRDksnw0THHixcP7pVprO6pMe/J85DIEKr3Ss49e3f6AgN120JXqxzL5JDQJAQ4PRoc cQpFwzpy/8alevF6fTMjaD+7dFT7RUhAfQq+c+rgCzjCcWPhW8N1pnTOaz3CYZW7aTpT Lh1FXwRSL+9AAUN+/RmoI89k2uJWDGd5bxCe2W2hs7ayciykQZLt5hzjjzSil2JPNYYd trSugCWQoqoEgc8PMvmEGtHiIhsALBfCAWDanO7S8kfb5CewNWZx7kzdgRyBI/LxMxIe fOgA== X-Gm-Message-State: AOAM531isfo9UTRjPB4f6DbTGjVT/kPzU2PbX8AMrNCLmr4NOjwnWCDV TdUK3x0wB//CR//4ldzmgRHDrnFHyhl4NqlAqt6fOmGz X-Google-Smtp-Source: ABdhPJx2CiCpqqOmJRjm3Q9aiT5DpUKONc/U5UwFob38rq8bT/N41323X5Vxr5jiz317e4LtAPCru97BIN6/YIe6ais= X-Received: by 2002:a05:6602:491:: with SMTP id y17mr18457564iov.72.1593465907648; Mon, 29 Jun 2020 14:25:07 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Wes McKinney Date: Mon, 29 Jun 2020 16:24:31 -0500 Message-ID: Subject: Re: Streaming use cases To: user@arrow.apache.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Jun 29, 2020 at 4:15 PM Cindy McMullen wrot= e: > > Hi, Wes - > > Yes, we're using Java/Scala, but also have a good Python code base for ou= r data scientists. Our goal is to replace storage/representation of Thrift= for ML features with some more OSS-friendly format, such as Parquet or Avr= o, and avoid writing multiple adapters. > > Ideally, we could stream data from Parquet disk in batches into Arrow-com= patible consumers. Is this a reasonable fit for something like Arrow Fligh= t? Yes, Flight is definitely designed for that -- fast / efficient delivery of Arrow record batches over TCP. > > On Mon, Jun 29, 2020 at 2:37 PM Wes McKinney wrote: >> >> hi Cindy, >> >> Could you clarify which PL you are working in (though assuming Scala / >> Java judging by your e-mail address)? >> >> In C++ we have reasonably mature Parquet->Arrow reading but not yet >> conversion from Arrow to Avro. In Java, I am not sure what is the >> state of the art for getting Parquet into Arrow but this code does not >> live in Apache Arrow -- I know that Apache Iceberg has done some work >> around this but I'm not sure how consumable it is as a library. >> Java-Arrow does have some preliminary support for converting Arrow to >> Avro, I believe. So there's some engineering here to do in any case. >> >> best, >> Wes >> >> On Mon, Jun 29, 2020 at 2:45 PM Cindy McMullen w= rote: >> > >> > Can I use Arrow to stream data from a Parquet file source and consume = it via Avro?