Return-Path: X-Original-To: apmail-incubator-crunch-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-crunch-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1A8BED905 for ; Fri, 26 Oct 2012 21:32:08 +0000 (UTC) Received: (qmail 14935 invoked by uid 500); 26 Oct 2012 21:32:08 -0000 Delivered-To: apmail-incubator-crunch-dev-archive@incubator.apache.org Received: (qmail 14901 invoked by uid 500); 26 Oct 2012 21:32:07 -0000 Mailing-List: contact crunch-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: crunch-dev@incubator.apache.org Delivered-To: mailing list crunch-dev@incubator.apache.org Received: (qmail 14890 invoked by uid 99); 26 Oct 2012 21:32:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Oct 2012 21:32:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jwills@cloudera.com designates 209.85.210.175 as permitted sender) Received: from [209.85.210.175] (HELO mail-ia0-f175.google.com) (209.85.210.175) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Oct 2012 21:32:01 +0000 Received: by mail-ia0-f175.google.com with SMTP id b35so2565149iac.6 for ; Fri, 26 Oct 2012 14:31:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=L9QPnlizT2CzinXoPnRwTYL0AEFrU32y4RVoCuPrK7E=; b=o/oDELJJzgrIhqI4w75NccQV5v3N71hV2C/NvlRhuEUO2Cn39WUjupKjuh69CnHBSr FoEmGmHiCNICRYwZOCH/EYJYMnkuFbIvckvFW6Af4YTUOIAcM9QPLqtgRKn2QoZWSAZ9 mN8e8AIqzUo1eC06DtxC4Rxd6H9EuPM3WXybB0PY5l4k0NNVbiwAgylR3fghJvFoVdp3 jEeorIUcb39r11IIcg4W6FSo1ot8AyTRHwuj/e1LoMuHxy9g2r6TYRm5Yo+SIxfEbg8X 8wkOdXO+F2mXU+w86rnLKHPdQIHC21Y38dpOiocYt0CDzOpFDQD0ERXeFf/dJ9/M2Kp4 8qqQ== Received: by 10.50.94.198 with SMTP id de6mr3433258igb.49.1351287100087; Fri, 26 Oct 2012 14:31:40 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.170.4 with HTTP; Fri, 26 Oct 2012 14:31:19 -0700 (PDT) In-Reply-To: References: <507F92A7.50406@xebia.com> From: Josh Wills Date: Fri, 26 Oct 2012 14:31:19 -0700 Message-ID: Subject: Re: Flume R -- any interest? To: crunch-dev@incubator.apache.org Content-Type: multipart/alternative; boundary=e89a8f2359b7971e2604ccfd0b09 X-Gm-Message-State: ALoCoQn1eDqYvHKWh9kNpgWYRJLj5zZlK6ZM3teH4fYIBmIKyz+BxdK4i2FbMLS2i0izfvyL1yus X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f2359b7971e2604ccfd0b09 Content-Type: text/plain; charset=ISO-8859-1 Hey Dmitriy, Got a fork going and looking forward to playing with crunchR this weekend-- thanks! J On Wed, Oct 24, 2012 at 1:28 PM, Dmitriy Lyubimov wrote: > Project template https://github.com/dlyubimov/crunchR > > Default profile does not compile R artifact . R profile compiles R > artifact. for convenience, it is enabled by supplying -DR to mvn > command line, e.g. > > mvn install -DR > > there's also a helper that installs the snapshot version of the > package in the crunchR module. > > There's RJava and JRI java dependencies which i did not find anywhere > in public maven repos; so it is installed into my github maven repo so > far. Should compile for 3rd party. > > -DR compilation requires R, RJava and optionally, RProtoBuf. R Doc > compilation requires roxygen2 (i think). > > For some reason RProtoBuf fails to import into another package, got a > weird exception when i put @import RProtoBuf into crunchR, so > RProtoBuf is now in "Suggests" category. Down the road that may be a > problem though... > > other than the template, not much else has been done so far... finding > hadoop libraries and adding it to the package path on initialization > via "hadoop classpath"... adding Crunch jars and its non-"provided" > transitives to the crunchR's java part... > > No legal stuff... > > No readmes... complete stealth at this point. > > On Thu, Oct 18, 2012 at 12:35 PM, Dmitriy Lyubimov > wrote: > > Ok, cool. I will try to roll project template by some time next week. > > we can start with prototyping and benchmarking something really > > simple, such as parallelDo(). > > > > My interim goal is to perhaps take some more or less simple algorithm > > from Mahout and demonstrate it can be solved with Rcrunch (or whatever > > name it has to be) in a comparable time (performance) but with much > > fewer lines of code. (say one of factorization or clustering things) > > > > > > On Wed, Oct 17, 2012 at 10:24 PM, Rahul wrote: > >> I am not much of R user but I am interested to see how well we can > integrate > >> the two. I would be happy to help. > >> > >> regards, > >> Rahul > >> > >> On 18-10-2012 04:04, Josh Wills wrote: > >>> > >>> On Wed, Oct 17, 2012 at 3:07 PM, Dmitriy Lyubimov > >>> wrote: > >>>> > >>>> Yep, ok. > >>>> > >>>> I imagine it has to be an R module so I can set up a maven project > >>>> with java/R code tree (I have been doing that a lot lately). Or if you > >>>> have a template to look at, it would be useful i guess too. > >>> > >>> No, please go right ahead. > >>> > >>>> > >>>> On Wed, Oct 17, 2012 at 3:02 PM, Josh Wills > wrote: > >>>>> > >>>>> I'd like it to be separate at first, but I am happy to help. Github > >>>>> repo? > >>>>> On Oct 17, 2012 2:57 PM, "Dmitriy Lyubimov" > wrote: > >>>>> > >>>>>> Ok maybe there's a benefit to try a JRI/RJava prototype on top of > >>>>>> Crunch for something simple. This should both save time and prove or > >>>>>> disprove if Crunch via RJava integration is viable. > >>>>>> > >>>>>> On my part i can try to do it within Crunch framework or we can keep > >>>>>> it completely separate. > >>>>>> > >>>>>> -d > >>>>>> > >>>>>> On Wed, Oct 17, 2012 at 2:08 PM, Josh Wills > >>>>>> wrote: > >>>>>>> > >>>>>>> I am an avid R user and would be into it-- who gave the talk? Was > it > >>>>>>> Murray Stokely? > >>>>>>> > >>>>>>> On Wed, Oct 17, 2012 at 2:05 PM, Dmitriy Lyubimov < > dlieu.7@gmail.com> > >>>>>> > >>>>>> wrote: > >>>>>>>> > >>>>>>>> Hello, > >>>>>>>> > >>>>>>>> I was pretty excited to learn of Google's experience of R mapping > of > >>>>>>>> flume java on one of recent BARUGs. I think a lot of applications > >>>>>>>> similar to what we do in Mahout could be prototyped using flume R. > >>>>>>>> > >>>>>>>> I did not quite get the details of Google implementation of R > >>>>>>>> mapping, > >>>>>>>> but i am not sure if just a direct mapping from R to Crunch would > be > >>>>>>>> sufficient (and, for most part, efficient). RJava/JRI and jni > seem to > >>>>>>>> be a pretty terrible performer to do that directly. > >>>>>>>> > >>>>>>>> > >>>>>>>> on top of it, I am thinknig if this project could have a > contributed > >>>>>>>> adapter to Mahout's distributed matrices, that would be just a > very > >>>>>>>> good synergy. > >>>>>>>> > >>>>>>>> Is there anyone interested in contributing/advising for open > source > >>>>>>>> version of flume R support? Just gauging interest, Crunch list > seems > >>>>>>>> like a natural place to poke. > >>>>>>>> > >>>>>>>> Thanks . > >>>>>>>> > >>>>>>>> -Dmitriy > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Director of Data Science > >>>>>>> Cloudera > >>>>>>> Twitter: @josh_wills > >>> > >>> > >>> > >> > -- Director of Data Science Cloudera Twitter: @josh_wills --e89a8f2359b7971e2604ccfd0b09--