arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cindy McMullen <cmcmul...@twitter.com>
Subject Re: Streaming use cases
Date Tue, 30 Jun 2020 15:01:57 GMT
Hi, Micah -

I see the Avro*Consumer classes in the javadocs
<https://arrow.apache.org/docs/java/>, which would lead me to believe we
have Arrow to Avro capability.  What am I missing?

On Mon, Jun 29, 2020 at 9:33 PM Micah Kornfield <emkornfield@gmail.com>
wrote:

> Just a clarification the functionality in Java is from Avro to Arrow (not
> Arrow to Avro).
>
>
>
> On Mon, Jun 29, 2020 at 2:25 PM Wes McKinney <wesmckinn@gmail.com> wrote:
>
>> On Mon, Jun 29, 2020 at 4:15 PM Cindy McMullen <cmcmullen@twitter.com>
>> wrote:
>> >
>> > Hi, Wes -
>> >
>> > Yes, we're using Java/Scala, but also have a good Python code base for
>> our data scientists.  Our goal is to replace storage/representation of
>> Thrift for ML features with some more OSS-friendly format, such as Parquet
>> or Avro, and avoid writing multiple adapters.
>> >
>> > Ideally, we could stream data from Parquet disk in batches into
>> Arrow-compatible consumers.  Is this a reasonable fit for something like
>> Arrow Flight?
>>
>> Yes, Flight is definitely designed for that -- fast / efficient
>> delivery of Arrow record batches over TCP.
>>
>> >
>> > On Mon, Jun 29, 2020 at 2:37 PM Wes McKinney <wesmckinn@gmail.com>
>> wrote:
>> >>
>> >> hi Cindy,
>> >>
>> >> Could you clarify which PL you are working in (though assuming Scala /
>> >> Java judging by your e-mail address)?
>> >>
>> >> In C++ we have reasonably mature Parquet->Arrow reading but not yet
>> >> conversion from Arrow to Avro. In Java, I am not sure what is the
>> >> state of the art for getting Parquet into Arrow but this code does not
>> >> live in Apache Arrow -- I know that Apache Iceberg has done some work
>> >> around this but I'm not sure how consumable it is as a library.
>> >> Java-Arrow does have some preliminary support for converting Arrow to
>> >> Avro, I believe. So there's some engineering here to do in any case.
>> >>
>> >> best,
>> >> Wes
>> >>
>> >> On Mon, Jun 29, 2020 at 2:45 PM Cindy McMullen <cmcmullen@twitter.com>
>> wrote:
>> >> >
>> >> > Can I use Arrow to stream data from a Parquet file source and
>> consume it via Avro?
>>
>

Mime
View raw message