From user-return-893-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Mon Jan 4 20:13:17 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-he-de.apache.org (mxout1-he-de.apache.org [95.216.194.37]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id EB073180654 for ; Mon, 4 Jan 2021 21:13:16 +0100 (CET) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-he-de.apache.org (ASF Mail Server at mxout1-he-de.apache.org) with SMTP id 3456465A09 for ; Mon, 4 Jan 2021 20:13:15 +0000 (UTC) Received: (qmail 53678 invoked by uid 500); 4 Jan 2021 20:13:15 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 53668 invoked by uid 99); 4 Jan 2021 20:13:15 -0000 Received: from spamproc1-he-de.apache.org (HELO spamproc1-he-de.apache.org) (116.203.196.100) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jan 2021 20:13:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamproc1-he-de.apache.org (ASF Mail Server at spamproc1-he-de.apache.org) with ESMTP id 6B01F1FF3A1 for ; Mon, 4 Jan 2021 20:13:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamproc1-he-de.apache.org X-Spam-Flag: NO X-Spam-Score: 0.001 X-Spam-Level: X-Spam-Status: No, score=0.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamproc1-he-de.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([116.203.227.195]) by localhost (spamproc1-he-de.apache.org [116.203.196.100]) (amavisd-new, port 10024) with ESMTP id SewVYCrBXBEJ for ; Mon, 4 Jan 2021 20:13:13 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.218.46; helo=mail-ej1-f46.google.com; envelope-from=emkornfield@gmail.com; receiver= Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com [209.85.218.46]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 5199ABCC0D for ; Mon, 4 Jan 2021 20:13:13 +0000 (UTC) Received: by mail-ej1-f46.google.com with SMTP id lt17so38434220ejb.3 for ; Mon, 04 Jan 2021 12:13:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:reply-to:from:date:message-id :subject:to; bh=c9IHLxiH8ICgG7u+HFCu3qQCoR/dUMR4L+aP09B3Pl0=; b=V3GWHZoLXJJMZmELmn7CRl7IWp4E2LQ0V5twAB1s0T3Y/zo0+TlJDvGROxxrUlE6NH SDc2KSnz5pVtquZzr6BHpGJd+lLdUoWPXuJm/ZXLP69IClHy/FoKHZyJ6t4jGVarOIUF nlA7KiRRrJYgTzbRpq6RluJXggvED9dSXeBIx1IrD//javHbXWVp6R5HADSRQkap0gJj mb3t2eL+uT3wma9tU7WjUKv8W2MN9JO7qKOUO93MWcc/9p+XEDNtN3J4y3ZYY9/p/oRd A8rLySYhHPw5m1r5LPMFb3VQQQvtOMHBv9DQK1Wrkvt1e2Zy1K7JUKaXalYZ6LCjkfym jpmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to; bh=c9IHLxiH8ICgG7u+HFCu3qQCoR/dUMR4L+aP09B3Pl0=; b=ZyxRUImAuAPj8QKdTuEy35vdoS32gafaDfEgEYiQKjG8ocx8L96e+3BZ1OYj5cWE/J DABoPTfHyfOOqH1M74clsK9/cRxYp+cm/jitQLEMR/Yf3WUEfRAE0jUfwuzCa9BA5myn lnn+U0JA9zyuIa8H2A29tgMgFJlTL5Ha0iGM/ExQy8aVLILpQtOHrKM1D9zDFnMnhcHK FSDWoJRzQntqxfnDwtKnH5KIQPSl8tl0lp90Xuj19wiX3xyrKeuYGE22X0XtiD6kQ1tT YAVnf0mB+Wj0ynKZKga8ZY/F6KGJamot6QQaIQ5r9rk+Sz08OBdyB0sYQ6BFBOgJNb+h l+ZA== X-Gm-Message-State: AOAM530Yh5UsAQ3YpNoR8ORuystY8rLRvaDQ6u1vaT6imeWY0R1KjaH0 1kNqrHhgNtxDmrvpDKnr5MwI0qef3uW7qejG67nnkuBWz2MsMg== X-Google-Smtp-Source: ABdhPJzGKCU5Bnn2RCihnpA82VaKbsNySeMRZcfC4SLzpU7oJCQEDnOljuBk0K81+Oh0OXCqrcA1VH87ysQJ2cgasHo= X-Received: by 2002:a17:906:cc9c:: with SMTP id oq28mr68086029ejb.224.1609791186292; Mon, 04 Jan 2021 12:13:06 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: Reply-To: emkornfield@gmail.com From: Micah Kornfield Date: Mon, 4 Jan 2021 12:12:55 -0800 Message-ID: Subject: Re: [Java AvroToArrow] Creating Arrow Files from Avro To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="000000000000d2522d05b818b79a" --000000000000d2522d05b818b79a Content-Type: text/plain; charset="UTF-8" Hi John, The overview of the java API might help here [1]. I also wrote up some notes on avro->Arrow conversion for a different user question [2]. ARROW-9613 [3] is tracking the impedance mismatch I mentioned in the e-mail. Hope this helps. -Micah [1] https://arrow.apache.org/docs/java/ipc.html#writing-and-reading-random-access-files [2] https://lists.apache.org/thread.html/rfa51f801b752faa881d318cff7394ee5b43161c100a707810c6c92fd%40%3Cuser.arrow.apache.org%3E [3] https://issues.apache.org/jira/browse/ARROW-9613 On Mon, Dec 28, 2020 at 10:33 PM John E. Conlon wrote: > Creating a DataEngineering pipeline that will create transform binary Avro > objects in S3 buckets to S3 Arrow objects and Parquet objects. > > See that Java libraries don't support Parquet at this time so I plan to > first use the Arrow Java libraries for the Avro->Arrow transform and then > use the Python Arrow to do the Arrow->Parquet transform. > > On the Java side I plan to download my Avro objects to a file, then create > the Arrow files and then upload these. > > See the AvroToArrow.avroToArrowIterator(schema, decoder, config) also see > the tests using AvroToArrow but even though I have read the limited > documentation I am not sure how to use go about using this to read the Avro > files and write output Arrow file. > > Can someone provide me with an example? > > > > > --000000000000d2522d05b818b79a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi John,
The overview of the java API might help here = [1].=C2=A0 I also wrote up some notes on avro->Arrow conversion for a di= fferent user question [2].=C2=A0 ARROW-9613 [3] is tracking the impedance m= ismatch I mentioned in the e-mail.

Hope this helps= .

-Micah

=

= On Mon, Dec 28, 2020 at 10:33 PM John E. Conlon <jconlon@apache.org> wrote:
Creating a DataEngineering pipeline that w= ill create transform binary Avro objects in S3 buckets to S3 Arrow objects = and Parquet objects.=C2=A0

See that Java libraries don't support Parquet at this time so I plan to= first use the Arrow Java libraries for the Avro->Arrow transform and th= en use the Python Arrow to do the Arrow->Parquet transform.=C2=A0

On the Java side I plan to download my Avro objects to a file, then create = the Arrow files and then upload these.=C2=A0

See the AvroToArrow.avroToArrowIterator(schema, decoder, config) also see t= he tests using AvroToArrow but even though I have read the limited document= ation I am not sure how to use go about using this to read the Avro files a= nd write output Arrow file.

Can someone provide me with an example?




--000000000000d2522d05b818b79a--