arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Le Dem <jul...@dremio.com>
Subject Re: Serialize/deserialize ArrowRecordBatch to/from bytes?
Date Wed, 26 Apr 2017 21:21:50 GMT
Example of writing to and reading from a file:
https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/file/TestArrowFile.java
Similarly, in case you don't want to go through a file:
Unloading a vector into buffers and loading from buffers:
https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/TestVectorUnloadLoad.java
The VectorLoader/Unloader are used to read/write FIles

On Wed, Apr 26, 2017 at 10:31 AM, Li Jin <ice.xelloss@gmail.com> wrote:

> Thanks for the various pointers. I was looking at ArrowFileWriter/Reader
> and got a little bit confused.
>
> So what I am trying to do is to convert a list of spark rows into some
> arrow format in java ( I will probably go with the file format for now),
> send the bytes to python, deserialize it into a pyarrow table.
>
> What is what I currently plan to do:
> (1) convert the rows to one or more arrow batch record (Use the
> ValueVectors)
> (2) serialize the arrow batch records send it over to python (Not sure to
> use here, ArrowFileWriter?)
> (3) deserialize the bytes into pyarrow.Table using pyarrow.FileReader
>
> I *think* ArrowFileWriter is what I should use to send data over in (2),
> but:
> (1)  I would need to turn the arrow batch records into a VectorSchemaRoot
> by doing sth like
> this
> https://github.com/icexelloss/spark/blob/pandas-udf/sql/
> core/src/test/scala/org/apache/spark/sql/ArrowConvertersSuite.scala#L226
> (2) I am not sure how do I write all the data in a vector schema root using
> ArrowFileWriter.
>
> Does this sound the right thing to do?
>
> Thanks,
> Li
>
> On Tue, Apr 25, 2017 at 8:52 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
>
> > Also, now that we have a website that is easier to write content for (in
> > Markdown), it would be great if some Java developers could volunteer some
> > time to write user-facing documentation to go with the Javadocs.
> >
> > On Tue, Apr 25, 2017 at 8:51 PM, Wes McKinney <wesmckinn@gmail.com>
> wrote:
> >
> > > There is also https://github.com/apache/arrow/blob/master/java/
> > > veator/src/test/java/org/apache/arrow/vector/file/
> > TestArrowStreamPipe.java
> > >
> > > On Tue, Apr 25, 2017 at 8:46 PM, Li Jin <ice.xelloss@gmail.com> wrote:
> > >
> > >> Thanks Julien. I will follow
> > >> https://github.com/apache/arrow/blob/990e2bde758ac8bc6e4497a
> > >> e1bc37f89b71bb5cf/java/vector/src/test/java/org/apache/
> > >> arrow/vector/stream/MessageSerializerTest.java#L91
> > >>
> > >
> > >
> >
>



-- 
Julien

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message