incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Flume R -- any interest?
Date Fri, 26 Oct 2012 21:31:19 GMT
Hey Dmitriy,

Got a fork going and looking forward to playing with crunchR this weekend--
thanks!

J

On Wed, Oct 24, 2012 at 1:28 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

> Project template https://github.com/dlyubimov/crunchR
>
> Default profile does not compile R artifact . R profile compiles R
> artifact. for convenience, it is enabled by supplying -DR to mvn
> command line, e.g.
>
> mvn install -DR
>
> there's also a helper that installs the snapshot version of the
> package in the crunchR module.
>
> There's RJava and JRI java dependencies which i did not find anywhere
> in public maven repos; so it is installed into my github maven repo so
> far. Should compile for 3rd party.
>
> -DR compilation requires R, RJava and optionally, RProtoBuf. R Doc
> compilation requires roxygen2 (i think).
>
> For some reason RProtoBuf fails to import into another package, got a
> weird exception when i put @import RProtoBuf into crunchR, so
> RProtoBuf is now in "Suggests" category. Down the road that may be a
> problem though...
>
> other than the template, not much else has been done so far... finding
> hadoop libraries and adding it to the package path on initialization
> via "hadoop classpath"... adding Crunch jars and its non-"provided"
> transitives to the crunchR's java part...
>
> No legal stuff...
>
> No readmes... complete stealth at this point.
>
> On Thu, Oct 18, 2012 at 12:35 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> wrote:
> > Ok, cool. I will try to roll project template by some time next week.
> > we can start with prototyping and benchmarking something really
> > simple, such as parallelDo().
> >
> > My interim goal is to perhaps take some more or less simple algorithm
> > from Mahout and demonstrate it can be solved with Rcrunch (or whatever
> > name it has to be) in a comparable time (performance) but with much
> > fewer lines of code. (say one of factorization or clustering things)
> >
> >
> > On Wed, Oct 17, 2012 at 10:24 PM, Rahul <rsharma@xebia.com> wrote:
> >> I am not much of R user but I am interested to see how well we can
> integrate
> >> the two. I would be happy to help.
> >>
> >> regards,
> >> Rahul
> >>
> >> On 18-10-2012 04:04, Josh Wills wrote:
> >>>
> >>> On Wed, Oct 17, 2012 at 3:07 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> >>> wrote:
> >>>>
> >>>> Yep, ok.
> >>>>
> >>>> I imagine it has to be an R module so I can set up a maven project
> >>>> with java/R code tree (I have been doing that a lot lately). Or if you
> >>>> have a template to look at, it would be useful i guess too.
> >>>
> >>> No, please go right ahead.
> >>>
> >>>>
> >>>> On Wed, Oct 17, 2012 at 3:02 PM, Josh Wills <josh.wills@gmail.com>
> wrote:
> >>>>>
> >>>>> I'd like it to be separate at first, but I am happy to help. Github
> >>>>> repo?
> >>>>> On Oct 17, 2012 2:57 PM, "Dmitriy Lyubimov" <dlieu.7@gmail.com>
> wrote:
> >>>>>
> >>>>>> Ok maybe there's a benefit to try a JRI/RJava prototype on top
of
> >>>>>> Crunch for something simple. This should both save time and
prove or
> >>>>>> disprove if Crunch via RJava integration is viable.
> >>>>>>
> >>>>>> On my part i can try to do it within Crunch framework or we
can keep
> >>>>>> it completely separate.
> >>>>>>
> >>>>>> -d
> >>>>>>
> >>>>>> On Wed, Oct 17, 2012 at 2:08 PM, Josh Wills <jwills@cloudera.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> I am an avid R user and would be into it-- who gave the
talk? Was
> it
> >>>>>>> Murray Stokely?
> >>>>>>>
> >>>>>>> On Wed, Oct 17, 2012 at 2:05 PM, Dmitriy Lyubimov <
> dlieu.7@gmail.com>
> >>>>>>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hello,
> >>>>>>>>
> >>>>>>>> I was pretty excited to learn of Google's experience
of R mapping
> of
> >>>>>>>> flume java on one of recent BARUGs. I think a lot of
applications
> >>>>>>>> similar to what we do in Mahout could be prototyped
using flume R.
> >>>>>>>>
> >>>>>>>> I did not quite get the details of Google implementation
of R
> >>>>>>>> mapping,
> >>>>>>>> but i am not sure if just a direct mapping from R to
Crunch would
> be
> >>>>>>>> sufficient (and, for most part, efficient). RJava/JRI
and jni
> seem to
> >>>>>>>> be a pretty terrible performer to do that directly.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> on top of it, I am thinknig if this project could have
a
> contributed
> >>>>>>>> adapter to Mahout's distributed matrices, that would
be just a
> very
> >>>>>>>> good synergy.
> >>>>>>>>
> >>>>>>>> Is there anyone interested in contributing/advising
for open
> source
> >>>>>>>> version of flume R support? Just gauging interest, Crunch
list
> seems
> >>>>>>>> like a natural place to poke.
> >>>>>>>>
> >>>>>>>> Thanks .
> >>>>>>>>
> >>>>>>>> -Dmitriy
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Director of Data Science
> >>>>>>> Cloudera
> >>>>>>> Twitter: @josh_wills
> >>>
> >>>
> >>>
> >>
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message