arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Chen <d...@dv01.co>
Subject Re: Use case for R Arrow Bindings
Date Wed, 19 Jul 2017 21:29:00 GMT
I also sent a note about it to the dev list a month ago. Still have a huge
internal need and interested in helping push this along where we can.
Unfortunately, our team is more focused around Spark and doesn't have much
experience working with the R community.

On Wed, Jul 19, 2017 at 1:44 PM Clark Fitzgerald <clarkfitzg@gmail.com>
wrote:

> Hello all,
>
> I saw the notes come through from today's call:
>
> > * R Arrow Bindings?
> >  - Find use cases within the R community, contributors needed
> >  - R Feather bindings a useful starting point
>
> This year I've been working on parallel R on datasets in the 100+ GB range,
> and have found that loading and saving data from text files is a real
> bottleneck. Another consideration is breaking the data up into chunks for
> parallel processing while maintaining metadata and overall structure. So
> I've been watching Parquet and Arrow.
>
> Specifically here are two use cases in R where Arrow / Parquet could be
> helpful:
>
> - Splitting up a large data set into pieces which fit comfortably in memory
> then applying normal R functions to each piece. Basically GROUP BY.
> - Matloff's Software Alchemy, statistical averaging based on independent
> chunks of data. This requires rows to be randomly assigned to chunks.
>
> Another option besides starting from the R Feather bindings is to start
> with an automatically generated set of bindings:
> https://github.com/duncantl/RCodeGen
>
> Best,
> Clark Fitzgerald
>
-- 
VP of Engineering - dv01, Featured in Forbes Fintech 50 For 2016
<http://www.forbes.com/fintech/2016/#310668d56680>
915 Broadway | Suite 502 | New York, NY 10010
(646)-838-2310
dean@dv01.co | www.dv01.co

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message