arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antoine Pitrou <anto...@python.org>
Subject Re: Use arrow as a general data serialization framework in distributed stream data processing
Date Fri, 26 Apr 2019 08:16:02 GMT

It's "arbitrary" from Arrow's point of view, because Arrow itself cannot
represent this data (except as a binary blob).  Though, as Micah said,
this may change at some point.

Instead of extending Arrow to fit this use case, perhaps it would be
better to write a separate library that sits atop Arrow for your purposes?

Regards

Antoine.


Le 26/04/2019 à 04:20, Shawn Yang a écrit :
> Hi Antoine,
> It's not arbitrary data type, it's the type similar to data types in
> https://spark.apache.org/docs/latest/sql-reference.html#data-types and
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/sql.html#data-types.
> Our framework is a framework that is similar to
> flink streaming, but is written in c++/java/python. And data need to be
> transferred from java process to python process by tcp or shared
>  memory if they are on the same machine. For example, one case is online
> learning, the features is generated in java streaming, and
> then training data is transferred to python tensorflow worker for training.
> In system such as flink, data is row by row, not columnar, so there need a
> serialization framework
> to serialize data row by row in  language-independent way for
> c++/java/python.
> 
> Regards

Mime
View raw message