incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Flume R -- any interest?
Date Thu, 18 Oct 2012 19:35:22 GMT
Ok, cool. I will try to roll project template by some time next week.
we can start with prototyping and benchmarking something really
simple, such as parallelDo().

My interim goal is to perhaps take some more or less simple algorithm
from Mahout and demonstrate it can be solved with Rcrunch (or whatever
name it has to be) in a comparable time (performance) but with much
fewer lines of code. (say one of factorization or clustering things)


On Wed, Oct 17, 2012 at 10:24 PM, Rahul <rsharma@xebia.com> wrote:
> I am not much of R user but I am interested to see how well we can integrate
> the two. I would be happy to help.
>
> regards,
> Rahul
>
> On 18-10-2012 04:04, Josh Wills wrote:
>>
>> On Wed, Oct 17, 2012 at 3:07 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>> wrote:
>>>
>>> Yep, ok.
>>>
>>> I imagine it has to be an R module so I can set up a maven project
>>> with java/R code tree (I have been doing that a lot lately). Or if you
>>> have a template to look at, it would be useful i guess too.
>>
>> No, please go right ahead.
>>
>>>
>>> On Wed, Oct 17, 2012 at 3:02 PM, Josh Wills <josh.wills@gmail.com> wrote:
>>>>
>>>> I'd like it to be separate at first, but I am happy to help. Github
>>>> repo?
>>>> On Oct 17, 2012 2:57 PM, "Dmitriy Lyubimov" <dlieu.7@gmail.com> wrote:
>>>>
>>>>> Ok maybe there's a benefit to try a JRI/RJava prototype on top of
>>>>> Crunch for something simple. This should both save time and prove or
>>>>> disprove if Crunch via RJava integration is viable.
>>>>>
>>>>> On my part i can try to do it within Crunch framework or we can keep
>>>>> it completely separate.
>>>>>
>>>>> -d
>>>>>
>>>>> On Wed, Oct 17, 2012 at 2:08 PM, Josh Wills <jwills@cloudera.com>
>>>>> wrote:
>>>>>>
>>>>>> I am an avid R user and would be into it-- who gave the talk? Was
it
>>>>>> Murray Stokely?
>>>>>>
>>>>>> On Wed, Oct 17, 2012 at 2:05 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>>>>>
>>>>> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I was pretty excited to learn of Google's experience of R mapping
of
>>>>>>> flume java on one of recent BARUGs. I think a lot of applications
>>>>>>> similar to what we do in Mahout could be prototyped using flume
R.
>>>>>>>
>>>>>>> I did not quite get the details of Google implementation of R
>>>>>>> mapping,
>>>>>>> but i am not sure if just a direct mapping from R to Crunch would
be
>>>>>>> sufficient (and, for most part, efficient). RJava/JRI and jni
seem to
>>>>>>> be a pretty terrible performer to do that directly.
>>>>>>>
>>>>>>>
>>>>>>> on top of it, I am thinknig if this project could have a contributed
>>>>>>> adapter to Mahout's distributed matrices, that would be just
a very
>>>>>>> good synergy.
>>>>>>>
>>>>>>> Is there anyone interested in contributing/advising for open
source
>>>>>>> version of flume R support? Just gauging interest, Crunch list
seems
>>>>>>> like a natural place to poke.
>>>>>>>
>>>>>>> Thanks .
>>>>>>>
>>>>>>> -Dmitriy
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Director of Data Science
>>>>>> Cloudera
>>>>>> Twitter: @josh_wills
>>
>>
>>
>

Mime
View raw message