This isn't directly related to the question, but I was reading about the newly released JDK 16 today and there is initial support for explicit vectorized operations, which might be interesting to explore for anyone considering building a Java DataFrame implementation.

https://openjdk.java.net/jeps/338

On Tue, Mar 16, 2021 at 5:43 PM Andrew Melo <andrew.melo@gmail.com> wrote:
I can't speak to how complete it is, but I looked earlier for
something similar and ran across
https://github.com/deeplearning4j/nd4j .. it's probably not an exact
fit, but it does appear to be able to consume arrow buffers and expose
them to java.

Cheers
Andrew

On Tue, Mar 16, 2021 at 6:36 PM Wes McKinney <wesmckinn@gmail.com> wrote:
>
> This has been asked several times in the past but I'm not aware of
> anything "dataframe-like" in Java that's build against Arrow (or
> otherwise) that fills the kind of need that pandas does. There was a
> Scala project some years ago Saddle [1] (not Arrow-based) built
> initially by one of the early pandas developers but I don't think it's
> still being actively developed. To build a higher-level Java API on
> top of the Arrow Java libraries would be incredibly useful to the
> community I'm sure.
>
> [1]: https://github.com/saddle/saddle
>
> On Tue, Mar 16, 2021 at 5:06 PM Paul Whalen <pgwhalen@gmail.com> wrote:
> >
> > Hi,
> >
> > I've been using Arrow for some time now, mostly in the context of Arrow Flight between Java and Python.  While it's quite easy to convert Arrow data in Python to a pandas dataframe and manipulate it, I'm struggling to find an obvious analogue on the Java side.  VectorSchemaRoot is useful for loading/unloading/moving data, but clumsy for doing higher level operations, especially joins/aggregations/etc across "tables".
> >
> > In other words, if I wanted to load non Arrow formatted data from somewhere into Java, manipulate it with a dataframe like API, and then send the result somewhere via Flight, what library would be the best/simplest way to accomplish that?  I see lots of progress in other languages, but I'm wondering what would be recommended for Java.
> >
> > I'm currently looking at Spark SQL just in-application, but that seems a touch heavyweight, and I'm not sure it would do exactly what I've described (nor am I terribly familiar with Spark in the first place).
> >
> > If the premise of this question is flawed, please feel free to correct me.
> >
> > Thanks!
> > Paul