arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cindy McMullen <cmcmul...@twitter.com>
Subject Re: Streaming use cases
Date Mon, 29 Jun 2020 21:15:33 GMT
Hi, Wes -

Yes, we're using Java/Scala, but also have a good Python code base for our
data scientists.  Our goal is to replace storage/representation of Thrift
for ML features with some more OSS-friendly format, such as Parquet or
Avro, and avoid writing multiple adapters.

Ideally, we could stream data from Parquet disk in batches into
Arrow-compatible consumers.  Is this a reasonable fit for something like
Arrow Flight?


On Mon, Jun 29, 2020 at 2:37 PM Wes McKinney <wesmckinn@gmail.com> wrote:

> hi Cindy,
>
> Could you clarify which PL you are working in (though assuming Scala /
> Java judging by your e-mail address)?
>
> In C++ we have reasonably mature Parquet->Arrow reading but not yet
> conversion from Arrow to Avro. In Java, I am not sure what is the
> state of the art for getting Parquet into Arrow but this code does not
> live in Apache Arrow -- I know that Apache Iceberg has done some work
> around this but I'm not sure how consumable it is as a library.
> Java-Arrow does have some preliminary support for converting Arrow to
> Avro, I believe. So there's some engineering here to do in any case.
>
> best,
> Wes
>
> On Mon, Jun 29, 2020 at 2:45 PM Cindy McMullen <cmcmullen@twitter.com>
> wrote:
> >
> > Can I use Arrow to stream data from a Parquet file source and consume it
> via Avro?
>

Mime
View raw message