arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Whalen <pgwha...@gmail.com>
Subject Java dataframe library for arrow suggestions
Date Tue, 16 Mar 2021 23:06:05 GMT
Hi,

I've been using Arrow for some time now, mostly in the context of Arrow
Flight between Java and Python.  While it's quite easy to convert Arrow
data in Python to a pandas dataframe and manipulate it, I'm struggling to
find an obvious analogue on the Java side.  VectorSchemaRoot is useful for
loading/unloading/moving data, but clumsy for doing higher level
operations, especially joins/aggregations/etc across "tables".

In other words, if I wanted to load non Arrow formatted data from somewhere
into Java, manipulate it with a dataframe like API, and then send the
result somewhere via Flight, what library would be the best/simplest way to
accomplish that?  I see lots of progress in other languages, but I'm
wondering what would be recommended for Java.

I'm currently looking at Spark SQL just in-application, but that seems a
touch heavyweight, and I'm not sure it would do exactly what I've described
(nor am I terribly familiar with Spark in the first place).

If the premise of this question is flawed, please feel free to correct me.

Thanks!
Paul

Mime
View raw message