From user-return-1086-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Tue Mar 16 23:43:55 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-he-de.apache.org (mxout1-he-de.apache.org [95.216.194.37]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 69C4818065D for ; Wed, 17 Mar 2021 00:43:55 +0100 (CET) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-he-de.apache.org (ASF Mail Server at mxout1-he-de.apache.org) with SMTP id 7FEF56493D for ; Tue, 16 Mar 2021 23:43:54 +0000 (UTC) Received: (qmail 1431 invoked by uid 500); 16 Mar 2021 23:43:53 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 1421 invoked by uid 99); 16 Mar 2021 23:43:53 -0000 Received: from spamproc1-he-de.apache.org (HELO spamproc1-he-de.apache.org) (116.203.196.100) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Mar 2021 23:43:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamproc1-he-de.apache.org (ASF Mail Server at spamproc1-he-de.apache.org) with ESMTP id C45CD1FF39C for ; Tue, 16 Mar 2021 23:43:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamproc1-he-de.apache.org X-Spam-Flag: NO X-Spam-Score: -0.202 X-Spam-Level: X-Spam-Status: No, score=-0.202 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamproc1-he-de.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([116.203.227.195]) by localhost (spamproc1-he-de.apache.org [116.203.196.100]) (amavisd-new, port 10024) with ESMTP id noZ6O0h97Do6 for ; Tue, 16 Mar 2021 23:43:52 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.217.44; helo=mail-vs1-f44.google.com; envelope-from=andrew.melo@gmail.com; receiver= Received: from mail-vs1-f44.google.com (mail-vs1-f44.google.com [209.85.217.44]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id EC5DCBC74D for ; Tue, 16 Mar 2021 23:43:51 +0000 (UTC) Received: by mail-vs1-f44.google.com with SMTP id f1so101203vsl.0 for ; Tue, 16 Mar 2021 16:43:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=makIUa5T8c57dHGBzSmO3mG0ayqZPquZBnafi5BTmLg=; b=sD7pDPMOyzD3UfusXEG4CZ9SeAOAvA/e2Pk0LAwwZk9Pc1t5IXcJMFF4b1ecw9BGEG mEtWCMWw4FPaft+ZTkdECsxedPuy/7o17dmt7HdrZ5mVZ3BdPCbW7tWja4qEq5zSekW7 KZAdPvcIlg1Sbr9bXoLaDVKZuJNI9XWooviuQybAV3ZBuX3jZJgpCstT6BW7rl1fQE0C ImE/eyPHYuUALpSLmLWou/0zYQkFFyT6E+ED+5/+ny64ru76Mh22BMfgja/tq0l4cc// 2x1PO1HlJt3YOs4roHoYduBmNHwib/LLhXvOPyBqnT+N5+IPU1dTcjbh+eId7hjdjlEc uVSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=makIUa5T8c57dHGBzSmO3mG0ayqZPquZBnafi5BTmLg=; b=e56davBH4hM/12W+IlkDwx/dv6OGysi5Kx6LudURs6IRJALOVUP9dlsgDc+ItoxiS2 aPRUySwQNClsn+EXQBGP2g5ms7kI20OhE4AMACOLlMzfGk3WE1bWUiiyPW4NpxvAuQQ8 bchv6IzcRIw3DWJrSVGWAoUi7AY26Ykzm3aHmin4Z/KFGg/O2aHHrLslwLL3tkVIPAYz WGmWbqG15GExT8bTkTSB6uX3IEn1opZU+aoBMSTl81hSAGWjXuBTioV/3pZUJuu5cBzb /2+pikF6Gg/Hdp2xiDhnJJYhqSQSvg8eJ0fI4z5KI9/Xr5mFhgsRXokityX7595RFKmV BrQg== X-Gm-Message-State: AOAM533nCFU9PmxpTuDY5p2EokwHVDIR/skqFfhc8AErDmjxlW0TO1cv 6ZDE6kbEt9zxtQAm3B5IuC/67d21gUlFI8I+DLn+S2PV6Ks= X-Google-Smtp-Source: ABdhPJx9IREVFvRycGy7JJW+EXtjxXWc5rMxZrcynr8ItCtn9JKMeGd+jhF0Seg8uJkWKWVNYGCus4I72hwZb+foI1w= X-Received: by 2002:a67:f606:: with SMTP id k6mr1542988vso.50.1615938231187; Tue, 16 Mar 2021 16:43:51 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Andrew Melo Date: Tue, 16 Mar 2021 18:43:39 -0500 Message-ID: Subject: Re: Java dataframe library for arrow suggestions To: user@arrow.apache.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I can't speak to how complete it is, but I looked earlier for something similar and ran across https://github.com/deeplearning4j/nd4j .. it's probably not an exact fit, but it does appear to be able to consume arrow buffers and expose them to java. Cheers Andrew On Tue, Mar 16, 2021 at 6:36 PM Wes McKinney wrote: > > This has been asked several times in the past but I'm not aware of > anything "dataframe-like" in Java that's build against Arrow (or > otherwise) that fills the kind of need that pandas does. There was a > Scala project some years ago Saddle [1] (not Arrow-based) built > initially by one of the early pandas developers but I don't think it's > still being actively developed. To build a higher-level Java API on > top of the Arrow Java libraries would be incredibly useful to the > community I'm sure. > > [1]: https://github.com/saddle/saddle > > On Tue, Mar 16, 2021 at 5:06 PM Paul Whalen wrote: > > > > Hi, > > > > I've been using Arrow for some time now, mostly in the context of Arrow= Flight between Java and Python. While it's quite easy to convert Arrow da= ta in Python to a pandas dataframe and manipulate it, I'm struggling to fin= d an obvious analogue on the Java side. VectorSchemaRoot is useful for loa= ding/unloading/moving data, but clumsy for doing higher level operations, e= specially joins/aggregations/etc across "tables". > > > > In other words, if I wanted to load non Arrow formatted data from somew= here into Java, manipulate it with a dataframe like API, and then send the = result somewhere via Flight, what library would be the best/simplest way to= accomplish that? I see lots of progress in other languages, but I'm wonde= ring what would be recommended for Java. > > > > I'm currently looking at Spark SQL just in-application, but that seems = a touch heavyweight, and I'm not sure it would do exactly what I've describ= ed (nor am I terribly familiar with Spark in the first place). > > > > If the premise of this question is flawed, please feel free to correct = me. > > > > Thanks! > > Paul