arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: Java dataframe library for arrow suggestions
Date Tue, 16 Mar 2021 23:35:20 GMT
This has been asked several times in the past but I'm not aware of
anything "dataframe-like" in Java that's build against Arrow (or
otherwise) that fills the kind of need that pandas does. There was a
Scala project some years ago Saddle [1] (not Arrow-based) built
initially by one of the early pandas developers but I don't think it's
still being actively developed. To build a higher-level Java API on
top of the Arrow Java libraries would be incredibly useful to the
community I'm sure.

[1]: https://github.com/saddle/saddle

On Tue, Mar 16, 2021 at 5:06 PM Paul Whalen <pgwhalen@gmail.com> wrote:
>
> Hi,
>
> I've been using Arrow for some time now, mostly in the context of Arrow Flight between
Java and Python.  While it's quite easy to convert Arrow data in Python to a pandas dataframe
and manipulate it, I'm struggling to find an obvious analogue on the Java side.  VectorSchemaRoot
is useful for loading/unloading/moving data, but clumsy for doing higher level operations,
especially joins/aggregations/etc across "tables".
>
> In other words, if I wanted to load non Arrow formatted data from somewhere into Java,
manipulate it with a dataframe like API, and then send the result somewhere via Flight, what
library would be the best/simplest way to accomplish that?  I see lots of progress in other
languages, but I'm wondering what would be recommended for Java.
>
> I'm currently looking at Spark SQL just in-application, but that seems a touch heavyweight,
and I'm not sure it would do exactly what I've described (nor am I terribly familiar with
Spark in the first place).
>
> If the premise of this question is flawed, please feel free to correct me.
>
> Thanks!
> Paul

Mime
View raw message