arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Kornfield <emkornfi...@gmail.com>
Subject Re: Avro and Thrift converters
Date Thu, 21 May 2020 16:35:58 GMT
Hi Cindy,

> Are you saying that the Avro -> Arrow converter is already available in
> release 0.17.1?


Yes, in Java
<https://arrow.apache.org/docs/java/org/apache/arrow/AvroToArrow.html> [1] it
exists in a separate POM
<https://mvnrepository.com/artifact/org.apache.arrow/arrow-avro> [2].  Note
that this is still in an experimental/contrib state (i.e. I'm not sure if
anyone is using it in production) and it might get some refactoring, but it
should be good place to start experimenting, and feedback on it would be
welcome.

As for use cases: we're trying to move away from Thrift in parts of our ML
> stack.  We need to support wide, row-based data with schema support, so
> probably need to convert Thrift to Avro.  However, we'd love to use Arrow
> *between* components (Spark, TensorFlow, scikit-learn), but it's likely
> our data will originate in Avro and/or Thrift.

Thanks.  Like a I said I hope to work a little bit on the C++/Python side
of Avro to Arrow but I can't give an exact time frame for it.  Thrift I
think is more complicated since it seems like there are multiple protocols
that would likely need support.  But contributions are welcome :)

Hope this helps.

Micah

[1] https://arrow.apache.org/docs/java/org/apache/arrow/AvroToArrow.html
[2] https://mvnrepository.com/artifact/org.apache.arrow/arrow-avro

On Wed, May 20, 2020 at 12:36 PM Cindy McMullen <cmcmullen@twitter.com>
wrote:

> Hi, Micah -
>
> I wasn't aware that the Avro converter already existed in Java, since I
> couldn't find any Arrow docs on it. I was going by the Arrow/JIRA release
> tag.  Are you saying that the Avro -> Arrow converter is already available
> in release 0.17.1?
>
> As for use cases: we're trying to move away from Thrift in parts of our ML
> stack.  We need to support wide, row-based data with schema support, so
> probably need to convert Thrift to Avro.  However, we'd love to use Arrow
> *between* components (Spark, TensorFlow, scikit-learn), but it's likely
> our data will originate in Avro and/or Thrift.
>
> Thanks -
>
> -- Cindy
>
> On Wed, May 20, 2020 at 1:14 PM Micah Kornfield <emkornfield@gmail.com>
> wrote:
>
>> The  avro to arrow converter in c++/python will not be done anytime soon
>> unless someone else takes it up (one exists in Java).  It has been on my
>> low priority backlog for a while but I haven't had time to get to it.  We
>> should remove a specific release tag from it.
>>
>> As far as I know there are no plans for thrift or other formats at this
>> point.
>>
>> May I ask what your use case is?
>>
>> Thanks,
>> Micah
>>
>> On Wednesday, May 20, 2020, Cindy McMullen <cmcmullen@twitter.com> wrote:
>>
>>> I see that the Avro converter is planned for Arrow 1.0.0.  Any ideas
>>> about when that release might be?
>>>
>>> Any plans for a Thrift -> Avro converter?
>>>
>>

Mime
View raw message