crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: Flume R -- any interest?
Date Sun, 28 Oct 2012 18:56:22 GMT
On Fri, Oct 26, 2012 at 2:40 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> Ok, cool.
>
> So what state is Crunch in? I take it is in a fairly advanced state.
> So every api mentioned in the  FlumeJava paper is working , right? Or
> there's something that is not working specifically?

I think the only thing in the paper that we don't have in a working
state is MSCR fusion. It's mostly just a question of prioritizing it
and getting the work done.

>
> On Fri, Oct 26, 2012 at 2:31 PM, Josh Wills <jwills@cloudera.com> wrote:
>> Hey Dmitriy,
>>
>> Got a fork going and looking forward to playing with crunchR this weekend--
>> thanks!
>>
>> J
>>
>> On Wed, Oct 24, 2012 at 1:28 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>>
>>> Project template https://github.com/dlyubimov/crunchR
>>>
>>> Default profile does not compile R artifact . R profile compiles R
>>> artifact. for convenience, it is enabled by supplying -DR to mvn
>>> command line, e.g.
>>>
>>> mvn install -DR
>>>
>>> there's also a helper that installs the snapshot version of the
>>> package in the crunchR module.
>>>
>>> There's RJava and JRI java dependencies which i did not find anywhere
>>> in public maven repos; so it is installed into my github maven repo so
>>> far. Should compile for 3rd party.
>>>
>>> -DR compilation requires R, RJava and optionally, RProtoBuf. R Doc
>>> compilation requires roxygen2 (i think).
>>>
>>> For some reason RProtoBuf fails to import into another package, got a
>>> weird exception when i put @import RProtoBuf into crunchR, so
>>> RProtoBuf is now in "Suggests" category. Down the road that may be a
>>> problem though...
>>>
>>> other than the template, not much else has been done so far... finding
>>> hadoop libraries and adding it to the package path on initialization
>>> via "hadoop classpath"... adding Crunch jars and its non-"provided"
>>> transitives to the crunchR's java part...
>>>
>>> No legal stuff...
>>>
>>> No readmes... complete stealth at this point.
>>>
>>> On Thu, Oct 18, 2012 at 12:35 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>>> wrote:
>>> > Ok, cool. I will try to roll project template by some time next week.
>>> > we can start with prototyping and benchmarking something really
>>> > simple, such as parallelDo().
>>> >
>>> > My interim goal is to perhaps take some more or less simple algorithm
>>> > from Mahout and demonstrate it can be solved with Rcrunch (or whatever
>>> > name it has to be) in a comparable time (performance) but with much
>>> > fewer lines of code. (say one of factorization or clustering things)
>>> >
>>> >
>>> > On Wed, Oct 17, 2012 at 10:24 PM, Rahul <rsharma@xebia.com> wrote:
>>> >> I am not much of R user but I am interested to see how well we can
>>> integrate
>>> >> the two. I would be happy to help.
>>> >>
>>> >> regards,
>>> >> Rahul
>>> >>
>>> >> On 18-10-2012 04:04, Josh Wills wrote:
>>> >>>
>>> >>> On Wed, Oct 17, 2012 at 3:07 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> Yep, ok.
>>> >>>>
>>> >>>> I imagine it has to be an R module so I can set up a maven project
>>> >>>> with java/R code tree (I have been doing that a lot lately).
Or if you
>>> >>>> have a template to look at, it would be useful i guess too.
>>> >>>
>>> >>> No, please go right ahead.
>>> >>>
>>> >>>>
>>> >>>> On Wed, Oct 17, 2012 at 3:02 PM, Josh Wills <josh.wills@gmail.com>
>>> wrote:
>>> >>>>>
>>> >>>>> I'd like it to be separate at first, but I am happy to help.
Github
>>> >>>>> repo?
>>> >>>>> On Oct 17, 2012 2:57 PM, "Dmitriy Lyubimov" <dlieu.7@gmail.com>
>>> wrote:
>>> >>>>>
>>> >>>>>> Ok maybe there's a benefit to try a JRI/RJava prototype
on top of
>>> >>>>>> Crunch for something simple. This should both save time
and prove or
>>> >>>>>> disprove if Crunch via RJava integration is viable.
>>> >>>>>>
>>> >>>>>> On my part i can try to do it within Crunch framework
or we can keep
>>> >>>>>> it completely separate.
>>> >>>>>>
>>> >>>>>> -d
>>> >>>>>>
>>> >>>>>> On Wed, Oct 17, 2012 at 2:08 PM, Josh Wills <jwills@cloudera.com>
>>> >>>>>> wrote:
>>> >>>>>>>
>>> >>>>>>> I am an avid R user and would be into it-- who gave
the talk? Was
>>> it
>>> >>>>>>> Murray Stokely?
>>> >>>>>>>
>>> >>>>>>> On Wed, Oct 17, 2012 at 2:05 PM, Dmitriy Lyubimov
<
>>> dlieu.7@gmail.com>
>>> >>>>>>
>>> >>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>> Hello,
>>> >>>>>>>>
>>> >>>>>>>> I was pretty excited to learn of Google's experience
of R mapping
>>> of
>>> >>>>>>>> flume java on one of recent BARUGs. I think
a lot of applications
>>> >>>>>>>> similar to what we do in Mahout could be prototyped
using flume R.
>>> >>>>>>>>
>>> >>>>>>>> I did not quite get the details of Google implementation
of R
>>> >>>>>>>> mapping,
>>> >>>>>>>> but i am not sure if just a direct mapping from
R to Crunch would
>>> be
>>> >>>>>>>> sufficient (and, for most part, efficient).
RJava/JRI and jni
>>> seem to
>>> >>>>>>>> be a pretty terrible performer to do that directly.
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> on top of it, I am thinknig if this project
could have a
>>> contributed
>>> >>>>>>>> adapter to Mahout's distributed matrices, that
would be just a
>>> very
>>> >>>>>>>> good synergy.
>>> >>>>>>>>
>>> >>>>>>>> Is there anyone interested in contributing/advising
for open
>>> source
>>> >>>>>>>> version of flume R support? Just gauging interest,
Crunch list
>>> seems
>>> >>>>>>>> like a natural place to poke.
>>> >>>>>>>>
>>> >>>>>>>> Thanks .
>>> >>>>>>>>
>>> >>>>>>>> -Dmitriy
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> --
>>> >>>>>>> Director of Data Science
>>> >>>>>>> Cloudera
>>> >>>>>>> Twitter: @josh_wills
>>> >>>
>>> >>>
>>> >>>
>>> >>
>>>
>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera <http://www.cloudera.com>
>> Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message