arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Kornfield <emkornfi...@gmail.com>
Subject Re: Java dataframe library for arrow suggestions
Date Wed, 17 Mar 2021 04:01:57 GMT
There was a little bit of effort previously in Arrow to start building this
out (see the algorithms package), but we tabled it due to the large scope
and availability of maintainers for it.

On Tue, Mar 16, 2021 at 4:36 PM Wes McKinney <wesmckinn@gmail.com> wrote:

> This has been asked several times in the past but I'm not aware of
> anything "dataframe-like" in Java that's build against Arrow (or
> otherwise) that fills the kind of need that pandas does. There was a
> Scala project some years ago Saddle [1] (not Arrow-based) built
> initially by one of the early pandas developers but I don't think it's
> still being actively developed. To build a higher-level Java API on
> top of the Arrow Java libraries would be incredibly useful to the
> community I'm sure.
>
> [1]: https://github.com/saddle/saddle
>
> On Tue, Mar 16, 2021 at 5:06 PM Paul Whalen <pgwhalen@gmail.com> wrote:
> >
> > Hi,
> >
> > I've been using Arrow for some time now, mostly in the context of Arrow
> Flight between Java and Python.  While it's quite easy to convert Arrow
> data in Python to a pandas dataframe and manipulate it, I'm struggling to
> find an obvious analogue on the Java side.  VectorSchemaRoot is useful for
> loading/unloading/moving data, but clumsy for doing higher level
> operations, especially joins/aggregations/etc across "tables".
> >
> > In other words, if I wanted to load non Arrow formatted data from
> somewhere into Java, manipulate it with a dataframe like API, and then send
> the result somewhere via Flight, what library would be the best/simplest
> way to accomplish that?  I see lots of progress in other languages, but I'm
> wondering what would be recommended for Java.
> >
> > I'm currently looking at Spark SQL just in-application, but that seems a
> touch heavyweight, and I'm not sure it would do exactly what I've described
> (nor am I terribly familiar with Spark in the first place).
> >
> > If the premise of this question is flawed, please feel free to correct
> me.
> >
> > Thanks!
> > Paul
>

Mime
View raw message