arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Kornfield <emkornfi...@gmail.com>
Subject Re: Example for Apache Arrow Flight
Date Mon, 18 May 2020 03:36:05 GMT
Hi Damien,
I'm not an expert but I believe  populating the bytes field should be all
the is necessary.  If you want to be sure there are existing flight
integration tests that you could substitute  your final code into [1][2].

It is worth noting that quite a bit of effort went into avoiding memory
copies when sending bytes over the wire with gRPC in Java (the logic can be
seen in ArrowMessage [3]).

To answer your other questions:

>
>    - What is the data we must set in dataBody ?
>
> I believe this is a serialized Message.  VectorUnloader handles extracting
data from VectorSchemaRoot for serialization.

>
>    - If those are ArrowMessages, how can I map a ResultSet to this type?
>
> ResultSet->VectorSchema root (via the JDBC contrib library) ->
VectorUnloader -> Bytes.  It is worth noting that I think we left the JDBC
library in a state where where it creates a new VectorSchemaRoot each
time.  If this is the case we should probably change it to reuse and
existing VectorSchemaRoots.

It might also be useful to review the Java prose documentation which is
linked from the java README [4].

Hope this helps.

Micah

[1]
https://github.com/apache/arrow/blob/0188e45bbe688c45f10032e9c37fbedab755fa71/java/flight/flight-core/src/main/java/org/apache/arrow/flight/example/integration/IntegrationTestServer.java
[2]
https://github.com/apache/arrow/blob/3567dcf3a314f009e56b4e310f5eaa0e49c6469c/dev/archery/archery/integration/tester_java.py

[3]
https://github.com/apache/arrow/blob/39cc7434479ddce565e0a0fe35f416a9cd992700/java/flight/flight-core/src/main/java/org/apache/arrow/flight/ArrowMessage.java
[4] https://mail.google.com/mail/u/0/#inbox/FMfcgxwHNMZSxGXMbrZDFdwvrdZfGTdF




On Tue, May 12, 2020 at 7:06 PM Damien Chaillou <damien.chaillou@gmail.com>
wrote:

> Hi Andy,
> thanks for the VERY quick response and links, I will study them.
> However, I had in mind an implementation like the example you implemented
> in the Rust example folder[1], I mean implementing the FlightService trait
> from the gRPC service. This is why meant to build a FlightData, directly
> [2]. Would it mean I "just" have to transform my VectorSchemaRoot in a
> ByteString (set in the dataBody of the FlightData object) that will be
> streamed over gRPC?
>
> (I wanted to use Scala AkkaStream gRPC to implement my server, because I
> don't think *org.apache.arrow.flight.FlightServer* will fit my needs.)
>
> Or Maybe I didn't understand how a Flight Server should be implemented and
> I got it wrong?
>
> Cheers,
>
> Damien
>
> [1]
> https://github.com/andygrove/arrow/blob/master/rust/datafusion/examples/flight_server.rs
> [2] https://github.com/apache/arrow/blob/master/format/Flight.proto#L300
>
> Le mar. 12 mai 2020 à 21:58, Andy Grove <andygrove73@gmail.com> a écrit :
>
>> Hi Damien,
>>
>> Here is a brief answer that hopefully at least points you in the right
>> direction.
>>
>> You need to use the VectorSchemaRoot class to build batches of data.
>> There is some documentation on how to do that [1]. Then, in your
>> FlightProducer implementation, you need to pass the batches to the
>> FlightProducer.ServerStreamListener using the "start" and "next" methods
>> when batches are ready to be sent. There is sample code in the Arrrow repo
>> [2] and there is a Kotlin example that I wrote here [3].
>>
>> Andy.
>>
>> [1] https://arrow.apache.org/docs/java/ipc.html
>> [2]
>> https://github.com/apache/arrow/blob/master/java/flight/flight-core/src/main/java/org/apache/arrow/flight/example/ExampleFlightServer.java
>> [3]
>> https://github.com/ballista-compute/ballista/blob/master/jvm/executor/src/main/kotlin/BallistaFlightProducer.kt
>>
>> On Tue, May 12, 2020 at 6:43 PM Damien Chaillou <
>> damien.chaillou@gmail.com> wrote:
>>
>>> Hi!
>>>
>>> I'm currently playing with Apache Arrow Flight in java and cant get my
>>> head around how to implement something.
>>> In the *doGet* method, for example, I'm doing simple JDBC calls that I
>>> would like to stream over.
>>> If I understand correctly, the *FlightData*'s body should be a
>>> ArrowMessage serialised as a ByteString (?) I built from a ResultSet from
>>> my JDBC call. I though using JdbcToArrow
>>> <https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java#L149>
helper
>>> class to help, but I can't find any example of how to do such thing.
>>> I came with few questions :
>>>
>>>    - What is the data we must set in dataBody ?
>>>    - If those are ArrowMessages, how can I map a ResultSet to this type?
>>>    - How do we serialise to ByteString objects typed like ArrowMessage,
>>>    Schema ... ?
>>>
>>>
>>> Could anyone point a piece of code/blog post/anything to me please ?
>>>
>>> My toy project would be a generic proxy server in front of any database
>>> with available JDBC drivers that could stream queries over Arrow Flight
>>> (gRPC).
>>>
>>> Cheers, thanks!
>>>
>>>
>>> Damien
>>>
>>

Mime
View raw message