From user-return-459-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Mon May 18 03:36:22 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 92249180643 for ; Mon, 18 May 2020 05:36:21 +0200 (CEST) Received: (qmail 69747 invoked by uid 500); 18 May 2020 03:36:20 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 69737 invoked by uid 99); 18 May 2020 03:36:20 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 May 2020 03:36:20 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 11C68C04EB for ; Mon, 18 May 2020 03:36:20 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.249 X-Spam-Level: X-Spam-Status: No, score=0.249 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, KAM_LOTSOFHASH=0.25, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 0p_7SmZJO7dW for ; Mon, 18 May 2020 03:36:18 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.208.66; helo=mail-ed1-f66.google.com; envelope-from=emkornfield@gmail.com; receiver= Received: from mail-ed1-f66.google.com (mail-ed1-f66.google.com [209.85.208.66]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 57B31BB85A for ; Mon, 18 May 2020 03:36:18 +0000 (UTC) Received: by mail-ed1-f66.google.com with SMTP id bs4so7267415edb.6 for ; Sun, 17 May 2020 20:36:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:reply-to:from:date:message-id :subject:to; bh=sKLlC+7FW/k4x6W94jK5CIaZo5oZXiTA7Ig4AfVr3R4=; b=EpmYyZ6aimURkyErJMQwMEew2XEvF+4/aLm6ZzYpiSK8Us/T0ZIP3IMrvXmPAGCNlv EKhfXAGyaUY8+XYQB3a1af530anGQaBCfsHgYxh7Cd3J7YmoqVgDfu8s4Emt0o3ZbXaD zQjWRyf7LePHNS8BVGlMJ2Fsow1S+nN9cu/15Hb0P1UnOtrNUryn2ftPiv7IGlNp6sfl 4aYB1ukQZHQ9G8eOhHyJx+0FpFlulYPaOMtCii3Uyvrzbb7PKbDZ9CuYCgVnalYncCNl kBp167agO3WfU5TG5UZGMxVJDJcoB43QdfdIAroO7xKbsYL6QNIqmI7eteK2Mztcohw9 4X3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to; bh=sKLlC+7FW/k4x6W94jK5CIaZo5oZXiTA7Ig4AfVr3R4=; b=DEP7D5CM3ev9FyLbeoS/l0yylk3pDvfolZfB1SHx9qlYLVvPUBo4AXntCjasWXPlcZ 3L9zdhBS18JQm9LTxOOpBuNjhbZ0Bw2NWq5AYXYGPNQ2MIBeSuNAXhgAlY/oJFsVmlie ih+HOqjyIT12x+r8HE5gxiQ1lq4+wGgihzOlR1LTD0XNNCxOHIePv/YUKDHUsmXUU+nW wqXhpqgGvCX7zGAAnhTmxFVwa+vZD/QpY9ghZZ7dzMcAoyavji50Zztje8ntK4ahZ4ZO T15R4FCHEkKiqPzng5U5SNjycqnuN/QCdDo0hHYdAYllE1oT7CXfA9MLPzEceeqj9JWp HkJQ== X-Gm-Message-State: AOAM531hSWbL6K+6m73S71wOgI+bLAoLl47wHotBphAaIeHS9znTzo4/ czk4ah5q5GeweX0AOm3wShqrATno7RV7FvomPGaozb4jRMs= X-Google-Smtp-Source: ABdhPJzlxhw86oTVEQe6WJHKZbOjnj/xQXGCbBDM2NJrB3y6uQfDqsSIWW/Cvpd4v/NxgQCNDRg7ztixBReb6or1Wmk= X-Received: by 2002:a50:9e89:: with SMTP id a9mr12323331edf.24.1589772976964; Sun, 17 May 2020 20:36:16 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Reply-To: emkornfield@gmail.com From: Micah Kornfield Date: Sun, 17 May 2020 20:36:05 -0700 Message-ID: Subject: Re: Example for Apache Arrow Flight To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="00000000000090c8e605a5e3dd88" --00000000000090c8e605a5e3dd88 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Damien, I'm not an expert but I believe populating the bytes field should be all the is necessary. If you want to be sure there are existing flight integration tests that you could substitute your final code into [1][2]. It is worth noting that quite a bit of effort went into avoiding memory copies when sending bytes over the wire with gRPC in Java (the logic can be seen in ArrowMessage [3]). To answer your other questions: > > - What is the data we must set in dataBody ? > > I believe this is a serialized Message. VectorUnloader handles extractin= g data from VectorSchemaRoot for serialization. > > - If those are ArrowMessages, how can I map a ResultSet to this type? > > ResultSet->VectorSchema root (via the JDBC contrib library) -> VectorUnloader -> Bytes. It is worth noting that I think we left the JDBC library in a state where where it creates a new VectorSchemaRoot each time. If this is the case we should probably change it to reuse and existing VectorSchemaRoots. It might also be useful to review the Java prose documentation which is linked from the java README [4]. Hope this helps. Micah [1] https://github.com/apache/arrow/blob/0188e45bbe688c45f10032e9c37fbedab755fa= 71/java/flight/flight-core/src/main/java/org/apache/arrow/flight/example/in= tegration/IntegrationTestServer.java [2] https://github.com/apache/arrow/blob/3567dcf3a314f009e56b4e310f5eaa0e49c646= 9c/dev/archery/archery/integration/tester_java.py [3] https://github.com/apache/arrow/blob/39cc7434479ddce565e0a0fe35f416a9cd9927= 00/java/flight/flight-core/src/main/java/org/apache/arrow/flight/ArrowMessa= ge.java [4] https://mail.google.com/mail/u/0/#inbox/FMfcgxwHNMZSxGXMbrZDFdwvrdZfGTd= F On Tue, May 12, 2020 at 7:06 PM Damien Chaillou wrote: > Hi Andy, > thanks for the VERY quick response and links, I will study them. > However, I had in mind an implementation like the example you implemented > in the Rust example folder[1], I mean implementing the FlightService trai= t > from the gRPC service. This is why meant to build a FlightData, directly > [2]. Would it mean I "just" have to transform my VectorSchemaRoot in a > ByteString (set in the dataBody of the FlightData object) that will be > streamed over gRPC? > > (I wanted to use Scala AkkaStream gRPC to implement my server, because I > don't think *org.apache.arrow.flight.FlightServer* will fit my needs.) > > Or Maybe I didn't understand how a Flight Server should be implemented an= d > I got it wrong? > > Cheers, > > Damien > > [1] > https://github.com/andygrove/arrow/blob/master/rust/datafusion/examples/f= light_server.rs > [2] https://github.com/apache/arrow/blob/master/format/Flight.proto#L300 > > Le mar. 12 mai 2020 =C3=A0 21:58, Andy Grove a = =C3=A9crit : > >> Hi Damien, >> >> Here is a brief answer that hopefully at least points you in the right >> direction. >> >> You need to use the VectorSchemaRoot class to build batches of data. >> There is some documentation on how to do that [1]. Then, in your >> FlightProducer implementation, you need to pass the batches to the >> FlightProducer.ServerStreamListener using the "start" and "next" methods >> when batches are ready to be sent. There is sample code in the Arrrow re= po >> [2] and there is a Kotlin example that I wrote here [3]. >> >> Andy. >> >> [1] https://arrow.apache.org/docs/java/ipc.html >> [2] >> https://github.com/apache/arrow/blob/master/java/flight/flight-core/src/= main/java/org/apache/arrow/flight/example/ExampleFlightServer.java >> [3] >> https://github.com/ballista-compute/ballista/blob/master/jvm/executor/sr= c/main/kotlin/BallistaFlightProducer.kt >> >> On Tue, May 12, 2020 at 6:43 PM Damien Chaillou < >> damien.chaillou@gmail.com> wrote: >> >>> Hi! >>> >>> I'm currently playing with Apache Arrow Flight in java and cant get my >>> head around how to implement something. >>> In the *doGet* method, for example, I'm doing simple JDBC calls that I >>> would like to stream over. >>> If I understand correctly, the *FlightData*'s body should be a >>> ArrowMessage serialised as a ByteString (?) I built from a ResultSet fr= om >>> my JDBC call. I though using JdbcToArrow >>> helper >>> class to help, but I can't find any example of how to do such thing. >>> I came with few questions : >>> >>> - What is the data we must set in dataBody ? >>> - If those are ArrowMessages, how can I map a ResultSet to this type= ? >>> - How do we serialise to ByteString objects typed like ArrowMessage, >>> Schema ... ? >>> >>> >>> Could anyone point a piece of code/blog post/anything to me please ? >>> >>> My toy project would be a generic proxy server in front of any database >>> with available JDBC drivers that could stream queries over Arrow Flight >>> (gRPC). >>> >>> Cheers, thanks! >>> >>> >>> Damien >>> >> --00000000000090c8e605a5e3dd88 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Damien,
I'm not an expert but I believe= =C2=A0 populating the bytes field should be all the is necessary.=C2=A0 If = you want to be sure there are existing flight integration tests that you co= uld substitute=C2=A0 your final code into [1][2].=C2=A0

= It is worth noting that quite a bit of effort went into avoiding memory cop= ies when sending bytes over the wire with gRPC in Java (the logic can be se= en in ArrowMessage [3]).

To answer your other ques= tions:
  • What is the data we must set in=C2=A0dataBody ?= =C2=A0
I believe this is a serialized=C2=A0Messa= ge.=C2=A0 VectorUnloader handles extracting data from VectorSchemaRoot for = serialization.
    <= li style=3D"margin-left:15px">If those are ArrowMessages, how can I map a R= esultSet to this type?
ResultSet->Vecto= rSchema root (via the JDBC contrib library) -> VectorUnloader -> Byte= s.=C2=A0 It is worth noting that I think we left the JDBC library in a stat= e where where it creates a new VectorSchemaRoot each time.=C2=A0 If this is= the case we should probably change it to reuse and existing VectorSchemaRo= ots.

It might also be useful to review = the Java prose documentation which is linked from the java README [4].

Hope this helps.

Micah
<= div>




On Tue, May 12, 2020 at 7:06 PM Damien Chaillou <damien.chaillou@gmail.com> wro= te:
Hi Andy,=C2=A0
thanks for the VERY quick=C2=A0response and links, = I will study them.
However, I had in mind an implementation like = the example you implemented in the Rust example folder[1], I mean implement= ing the=C2=A0FlightService trait from the gRPC service. This is why meant t= o build a FlightData, directly [2]. Would it mean I "just" have t= o transform my=C2=A0VectorSchemaRoot in a ByteString (set in the dataBody o= f the FlightData object) that will be streamed over gRPC?=C2=A0
<= br>
(I wanted to use Scala AkkaStream gRPC to implement my server= , because I don't think=C2=A0org.apache.arrow.flight.FlightServer will fit my needs.)

Or Maybe I didn't unders= tand how a Flight Server should be implemented and I got it wrong?

Cheers,

Damien

Le=C2=A0mar. 12 mai 2020 = =C3=A0=C2=A021:58, Andy Grove <andygrove73@gmail.com> a =C3=A9crit=C2=A0:
H= i Damien,

Here is a brief answer that hopefully at= least points you in the right direction.

You need= to use the VectorSchemaRoot class to build batches of data. There is some = documentation on how to do that [1]. Then, in your FlightProducer implement= ation, you need to pass the batches to the FlightProducer.ServerStreamListe= ner using the "start" and "next" methods when batches a= re ready to be sent. There is sample code in the Arrrow repo [2] and there = is a Kotlin example that I wrote here [3].

<= div>Andy.


On Tue, May 12, 2020 at 6:43 PM Damien Chaillou <damien.chaillou@g= mail.com> wrote:
Hi!

I'm currently playing w= ith Apache Arrow Flight in java and cant=C2=A0get my head around how to imp= lement something.
In the doGet method, for=C2=A0example, I= 'm doing simple JDBC calls that I would like to stream over.
= If I understand correctly, the FlightData's=C2=A0body should be = a ArrowMessage serialised as a ByteString (?) I built from a ResultSet from= my JDBC call. I though using=C2=A0JdbcToArrow=C2=A0helper clas= s to help, but I can't find any example of how to do such thing.
I c= ame with few questions :=C2=A0
  • What is the data we must s= et in=C2=A0dataBody ?=C2=A0
  • If those are ArrowMessages, how can I m= ap a ResultSet to this type?
  • How do we serialise to ByteString obje= cts typed like=C2=A0ArrowMessage, Schema ... ?

Could anyone point a piece of code/blog post/anything to me please ?=

My toy project would be a generic proxy server in= front of any database with=C2=A0available JDBC drivers that could stream q= ueries over Arrow Flight (gRPC).

Cheers, thanks!


Damien
--00000000000090c8e605a5e3dd88--