arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Jin <ice.xell...@gmail.com>
Subject Re: Serialize/deserialize ArrowRecordBatch to/from bytes?
Date Wed, 26 Apr 2017 17:31:33 GMT
Thanks for the various pointers. I was looking at ArrowFileWriter/Reader
and got a little bit confused.

So what I am trying to do is to convert a list of spark rows into some
arrow format in java ( I will probably go with the file format for now),
send the bytes to python, deserialize it into a pyarrow table.

What is what I currently plan to do:
(1) convert the rows to one or more arrow batch record (Use the
ValueVectors)
(2) serialize the arrow batch records send it over to python (Not sure to
use here, ArrowFileWriter?)
(3) deserialize the bytes into pyarrow.Table using pyarrow.FileReader

I *think* ArrowFileWriter is what I should use to send data over in (2),
but:
(1)  I would need to turn the arrow batch records into a VectorSchemaRoot
by doing sth like
this
https://github.com/icexelloss/spark/blob/pandas-udf/sql/core/src/test/scala/org/apache/spark/sql/ArrowConvertersSuite.scala#L226
(2) I am not sure how do I write all the data in a vector schema root using
ArrowFileWriter.

Does this sound the right thing to do?

Thanks,
Li

On Tue, Apr 25, 2017 at 8:52 PM, Wes McKinney <wesmckinn@gmail.com> wrote:

> Also, now that we have a website that is easier to write content for (in
> Markdown), it would be great if some Java developers could volunteer some
> time to write user-facing documentation to go with the Javadocs.
>
> On Tue, Apr 25, 2017 at 8:51 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
>
> > There is also https://github.com/apache/arrow/blob/master/java/
> > veator/src/test/java/org/apache/arrow/vector/file/
> TestArrowStreamPipe.java
> >
> > On Tue, Apr 25, 2017 at 8:46 PM, Li Jin <ice.xelloss@gmail.com> wrote:
> >
> >> Thanks Julien. I will follow
> >> https://github.com/apache/arrow/blob/990e2bde758ac8bc6e4497a
> >> e1bc37f89b71bb5cf/java/vector/src/test/java/org/apache/
> >> arrow/vector/stream/MessageSerializerTest.java#L91
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message