arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Kornfield <emkornfi...@gmail.com>
Subject Re: [Java AvroToArrow] Creating Arrow Files from Avro
Date Mon, 04 Jan 2021 20:12:55 GMT
Hi John,
The overview of the java API might help here [1].  I also wrote up some
notes on avro->Arrow conversion for a different user question [2].
ARROW-9613 [3] is tracking the impedance mismatch I mentioned in the e-mail.

Hope this helps.

-Micah

[1]
https://arrow.apache.org/docs/java/ipc.html#writing-and-reading-random-access-files
[2]
https://lists.apache.org/thread.html/rfa51f801b752faa881d318cff7394ee5b43161c100a707810c6c92fd%40%3Cuser.arrow.apache.org%3E
[3] https://issues.apache.org/jira/browse/ARROW-9613

On Mon, Dec 28, 2020 at 10:33 PM John E. Conlon <jconlon@apache.org> wrote:

> Creating a DataEngineering pipeline that will create transform binary Avro
> objects in S3 buckets to S3 Arrow objects and Parquet objects.
>
> See that Java libraries don't support Parquet at this time so I plan to
> first use the Arrow Java libraries for the Avro->Arrow transform and then
> use the Python Arrow to do the Arrow->Parquet transform.
>
> On the Java side I plan to download my Avro objects to a file, then create
> the Arrow files and then upload these.
>
> See the AvroToArrow.avroToArrowIterator(schema, decoder, config) also see
> the tests using AvroToArrow but even though I have read the limited
> documentation I am not sure how to use go about using this to read the Avro
> files and write output Arrow file.
>
> Can someone provide me with an example?
>
>
>
>
>

Mime
View raw message