incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Flume R -- any interest?
Date Fri, 16 Nov 2012 19:07:55 GMT
Are you running this using LocalJobRunner? Does calling
Pipeline.enableDebug() before run() help? If it doesn't, it'll help
settle a debate I'm having w/Matthias. ;-)

On Fri, Nov 16, 2012 at 10:22 AM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> I see the error in the logs but Pipeline.run() has never thrown anything.
> isSucceeded() subsequently returns false. Is there any way to extract
> client-side problem rather than just being able to state that job failed?
> or it is ok and the only diagnostics by design?
>
> ============
> 68124 [Thread-8] INFO  org.apache.crunch.impl.mr.exec.CrunchJob  -
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: hdfs://localhost:11010/crunchr-example/input
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231)
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248)
> at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:944)
> at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:961)
> at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
> at
> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:331)
> at org.apache.crunch.impl.mr.exec.CrunchJob.submit(CrunchJob.java:135)
> at
> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:251)
> at
> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.run(CrunchJobControl.java:279)
> at java.lang.Thread.run(Thread.java:662)
>
>
> On Mon, Nov 12, 2012 at 5:41 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>
>> for hadoop nodes i guess yet another option to soft-link the .so into
>> hadoop's native lib folder
>>
>>
>> On Mon, Nov 12, 2012 at 5:37 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>wrote:
>>
>>> I actually want to defer this to hadoop admins, we just need to create a
>>> procedure for setting up nodes. Ideally as simple as possible. something
>>> like
>>>
>>> 1) setup R
>>> 2) install.packages("rJava","RProtoBuf","crunchR")
>>> 3) R CMD javareconf
>>> 3) add result of R --vanilla <<< 'system.file("jri", package="rJava")
to
>>> either mapred command lines or LD_LIBRARY_PATH...
>>>
>>> but it will depend on their versions of hadoop, jre etc. I hoped crunch
>>> might have something to hide a lot of that complexity (since it is about
>>> hiding complexities, for the most part :)  ) besides hadoop has a way to
>>> ship .so's to the backend so if crunch had an api to do something similar
>>> it is conceivable that driver might yank and ship it too to hide that
>>> complexity as well. But then there's a host of issues how to handle
>>> potentially different rJava versions installed on different nodes... So, it
>>> increasingly looks like something we might want to defer to sysops to do
>>> with approximate set of requirements .
>>>
>>>
>>> On Mon, Nov 12, 2012 at 5:29 PM, Josh Wills <jwills@cloudera.com> wrote:
>>>
>>>> On Mon, Nov 12, 2012 at 5:17 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>>>> wrote:
>>>>
>>>> > so java tasks need to be able to load libjri.so from
>>>> > whatever system.file("jri", package="rJava") says.
>>>> >
>>>> > Traditionally, these issues were handled with -Djava.library.path.
>>>> > Apparently there's nothing java task can do to enable loadLibrary()
>>>> command
>>>> > to see the damn library once started. But -Djava.library.path requires
>>>> for
>>>> > nodes to configure and lock jvm command line from modifications of the
>>>> > client.  which is fine.
>>>> >
>>>> > I also discovered that LD_LIBRARY_PATH actually works with jre 1.6
>>>> (again).
>>>> >
>>>> > but... any other suggestions about best practice configuring crunch
to
>>>> run
>>>> > user's .so's?
>>>> >
>>>>
>>>> Not off the top of my head. I suspect that whatever you come up with will
>>>> become the "best practice." :)
>>>>
>>>> >
>>>> > thanks.
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Sun, Nov 11, 2012 at 1:41 PM, Josh Wills <josh.wills@gmail.com>
>>>> wrote:
>>>> >
>>>> > > I believe that is a safe assumption, at least right now.
>>>> > >
>>>> > >
>>>> > > On Sun, Nov 11, 2012 at 1:38 PM, Dmitriy Lyubimov <dlieu.7@gmail.com
>>>> >
>>>> > > wrote:
>>>> > >
>>>> > > > Question.
>>>> > > >
>>>> > > > So in Crunch api, initialize() doesn't get an emitter. and
the
>>>> process
>>>> > > gets
>>>> > > > emitter every time.
>>>> > > >
>>>> > > > However, my guess any single reincranation of a DoFn object
in the
>>>> > > backend
>>>> > > > will always be getting the same emitter thru its lifecycle.
Is it
>>>> an
>>>> > > > admissible assumption or there's currently a counter example
to
>>>> that?
>>>> > > >
>>>> > > > The problem is that as i implement the two way pipeline of
input
>>>> and
>>>> > > > emitter data between R and Java, I am bulking these calls
together
>>>> for
>>>> > > > performance reasons. Each individual datum in these chunks
of data
>>>> will
>>>> > > not
>>>> > > > have attached emitter function information to them in any
way.
>>>> (well it
>>>> > > > could but it would be a performance killer and i bet emitter
never
>>>> > > > changes).
>>>> > > >
>>>> > > > So, thoughts? can i assume emitter never changes between first
and
>>>> lass
>>>> > > > call to DoFn instance?
>>>> > > >
>>>> > > > thanks.
>>>> > > >
>>>> > > >
>>>> > > > On Mon, Oct 29, 2012 at 6:32 PM, Dmitriy Lyubimov <
>>>> dlieu.7@gmail.com>
>>>> > > > wrote:
>>>> > > >
>>>> > > > > yes...
>>>> > > > >
>>>> > > > > i think it worked for me before, although just adding
all jars
>>>> from R
>>>> > > > > package distribution would be a little bit more appropriate
>>>> approach
>>>> > > > > -- but it creates a problem with jars in dependent R
packages. I
>>>> > think
>>>> > > > > it would be much easier to just compile a hadoop-job
file and
>>>> stick
>>>> > it
>>>> > > > > in rather than doing cherry-picking of individual jars
from who
>>>> knows
>>>> > > > > how many locations.
>>>> > > > >
>>>> > > > > i think i used the hadoop job format with distributed
cache
>>>> before
>>>> > and
>>>> > > > > it worked... at least with Pig "register jar" functionality.
>>>> > > > >
>>>> > > > > ok i guess i will just try if it works.
>>>> > > > >
>>>> > > > > On Mon, Oct 29, 2012 at 6:24 PM, Josh Wills <jwills@cloudera.com
>>>> >
>>>> > > wrote:
>>>> > > > > > On Mon, Oct 29, 2012 at 5:46 PM, Dmitriy Lyubimov
<
>>>> > dlieu.7@gmail.com
>>>> > > >
>>>> > > > > wrote:
>>>> > > > > >
>>>> > > > > >> Great! so it is in Crunch.
>>>> > > > > >>
>>>> > > > > >> does it support hadoop-job jar format or only
pure java jars?
>>>> > > > > >>
>>>> > > > > >
>>>> > > > > > I think just pure jars-- you're referring to hadoop-job
format
>>>> as
>>>> > > > having
>>>> > > > > > all the dependencies in a lib/ directory within
the jar?
>>>> > > > > >
>>>> > > > > >
>>>> > > > > >>
>>>> > > > > >> On Mon, Oct 29, 2012 at 5:10 PM, Josh Wills
<
>>>> jwills@cloudera.com>
>>>> > > > > wrote:
>>>> > > > > >> > On Mon, Oct 29, 2012 at 5:04 PM, Dmitriy
Lyubimov <
>>>> > > > dlieu.7@gmail.com>
>>>> > > > > >> wrote:
>>>> > > > > >> >
>>>> > > > > >> >> I think i need functionality to add
more jars (or external
>>>> > > > > hadoop-jar)
>>>> > > > > >> >> to drive that from an R package. Just
setting job jar by
>>>> class
>>>> > is
>>>> > > > not
>>>> > > > > >> >> enough. I can push overall job-jar
as an addiitonal jar to
>>>> R
>>>> > > > package;
>>>> > > > > >> >> however, i cannot really run hadoop
command line on it, i
>>>> need
>>>> > to
>>>> > > > set
>>>> > > > > >> >> up classpath thru RJava.
>>>> > > > > >> >>
>>>> > > > > >> >> Traditional single hadoop job jar will
unlikely work here
>>>> since
>>>> > > we
>>>> > > > > >> >> cannot hardcode pipelines in java code
but rather have to
>>>> > > construct
>>>> > > > > >> >> them on the fly. (well, we could serialize
pipeline
>>>> definitions
>>>> > > > from
>>>> > > > > R
>>>> > > > > >> >> and then replay them in a driver --
but that's too
>>>> cumbersome
>>>> > and
>>>> > > > > more
>>>> > > > > >> >> work than it has to be.) There's no
reason why i shouldn't
>>>> be
>>>> > > able
>>>> > > > to
>>>> > > > > >> >> do pig-like "register jar" or "setJobJar"
(mahout-like)
>>>> when
>>>> > > > kicking
>>>> > > > > >> >> off a pipeline.
>>>> > > > > >> >>
>>>> > > > > >> >
>>>> > > > > >> > o.a.c.util.DistCache.addJarToDistributedCache?
>>>> > > > > >> >
>>>> > > > > >> >
>>>> > > > > >> >>
>>>> > > > > >> >>
>>>> > > > > >> >> On Mon, Oct 29, 2012 at 10:17 AM, Dmitriy
Lyubimov <
>>>> > > > > dlieu.7@gmail.com>
>>>> > > > > >> >> wrote:
>>>> > > > > >> >> > Ok, sounds very promising...
>>>> > > > > >> >> >
>>>> > > > > >> >> > i'll try to start digging on the
driver part this week
>>>> then
>>>> > > > > (Pipeline
>>>> > > > > >> >> > wrapper in R5).
>>>> > > > > >> >> >
>>>> > > > > >> >> > On Sun, Oct 28, 2012 at 11:56
AM, Josh Wills <
>>>> > > > josh.wills@gmail.com
>>>> > > > > >
>>>> > > > > >> >> wrote:
>>>> > > > > >> >> >> On Fri, Oct 26, 2012 at 2:40
PM, Dmitriy Lyubimov <
>>>> > > > > dlieu.7@gmail.com
>>>> > > > > >> >
>>>> > > > > >> >> wrote:
>>>> > > > > >> >> >>> Ok, cool.
>>>> > > > > >> >> >>>
>>>> > > > > >> >> >>> So what state is Crunch
in? I take it is in a fairly
>>>> > advanced
>>>> > > > > state.
>>>> > > > > >> >> >>> So every api mentioned
in the  FlumeJava paper is
>>>> working ,
>>>> > > > > right?
>>>> > > > > >> Or
>>>> > > > > >> >> >>> there's something that
is not working specifically?
>>>> > > > > >> >> >>
>>>> > > > > >> >> >> I think the only thing in
the paper that we don't have
>>>> in a
>>>> > > > > working
>>>> > > > > >> >> >> state is MSCR fusion. It's
mostly just a question of
>>>> > > > prioritizing
>>>> > > > > it
>>>> > > > > >> >> >> and getting the work done.
>>>> > > > > >> >> >>
>>>> > > > > >> >> >>>
>>>> > > > > >> >> >>> On Fri, Oct 26, 2012 at
2:31 PM, Josh Wills <
>>>> > > > jwills@cloudera.com
>>>> > > > > >
>>>> > > > > >> >> wrote:
>>>> > > > > >> >> >>>> Hey Dmitriy,
>>>> > > > > >> >> >>>>
>>>> > > > > >> >> >>>> Got a fork going and
looking forward to playing with
>>>> > crunchR
>>>> > > > > this
>>>> > > > > >> >> weekend--
>>>> > > > > >> >> >>>> thanks!
>>>> > > > > >> >> >>>>
>>>> > > > > >> >> >>>> J
>>>> > > > > >> >> >>>>
>>>> > > > > >> >> >>>> On Wed, Oct 24, 2012
at 1:28 PM, Dmitriy Lyubimov <
>>>> > > > > >> dlieu.7@gmail.com>
>>>> > > > > >> >> wrote:
>>>> > > > > >> >> >>>>
>>>> > > > > >> >> >>>>> Project template
>>>> https://github.com/dlyubimov/crunchR
>>>> > > > > >> >> >>>>>
>>>> > > > > >> >> >>>>> Default profile
does not compile R artifact . R
>>>> profile
>>>> > > > > compiles R
>>>> > > > > >> >> >>>>> artifact. for
convenience, it is enabled by
>>>> supplying -DR
>>>> > > to
>>>> > > > > mvn
>>>> > > > > >> >> >>>>> command line,
e.g.
>>>> > > > > >> >> >>>>>
>>>> > > > > >> >> >>>>> mvn install -DR
>>>> > > > > >> >> >>>>>
>>>> > > > > >> >> >>>>> there's also a
helper that installs the snapshot
>>>> version
>>>> > of
>>>> > > > the
>>>> > > > > >> >> >>>>> package in the
crunchR module.
>>>> > > > > >> >> >>>>>
>>>> > > > > >> >> >>>>> There's RJava
and JRI java dependencies which i did
>>>> not
>>>> > > find
>>>> > > > > >> anywhere
>>>> > > > > >> >> >>>>> in public maven
repos; so it is installed into my
>>>> github
>>>> > > > maven
>>>> > > > > >> repo
>>>> > > > > >> >> so
>>>> > > > > >> >> >>>>> far. Should compile
for 3rd party.
>>>> > > > > >> >> >>>>>
>>>> > > > > >> >> >>>>> -DR compilation
requires R, RJava and optionally,
>>>> > > RProtoBuf.
>>>> > > > R
>>>> > > > > Doc
>>>> > > > > >> >> >>>>> compilation requires
roxygen2 (i think).
>>>> > > > > >> >> >>>>>
>>>> > > > > >> >> >>>>> For some reason
RProtoBuf fails to import into
>>>> another
>>>> > > > package,
>>>> > > > > >> got a
>>>> > > > > >> >> >>>>> weird exception
when i put @import RProtoBuf into
>>>> > crunchR,
>>>> > > so
>>>> > > > > >> >> >>>>> RProtoBuf is now
in "Suggests" category. Down the
>>>> road
>>>> > that
>>>> > > > may
>>>> > > > > >> be a
>>>> > > > > >> >> >>>>> problem though...
>>>> > > > > >> >> >>>>>
>>>> > > > > >> >> >>>>> other than the
template, not much else has been done
>>>> so
>>>> > > > far...
>>>> > > > > >> >> finding
>>>> > > > > >> >> >>>>> hadoop libraries
and adding it to the package path on
>>>> > > > > >> initialization
>>>> > > > > >> >> >>>>> via "hadoop classpath"...
adding Crunch jars and its
>>>> > > > > >> non-"provided"
>>>> > > > > >> >> >>>>> transitives to
the crunchR's java part...
>>>> > > > > >> >> >>>>>
>>>> > > > > >> >> >>>>> No legal stuff...
>>>> > > > > >> >> >>>>>
>>>> > > > > >> >> >>>>> No readmes...
complete stealth at this point.
>>>> > > > > >> >> >>>>>
>>>> > > > > >> >> >>>>> On Thu, Oct 18,
2012 at 12:35 PM, Dmitriy Lyubimov <
>>>> > > > > >> >> dlieu.7@gmail.com>
>>>> > > > > >> >> >>>>> wrote:
>>>> > > > > >> >> >>>>> > Ok, cool.
I will try to roll project template by
>>>> some
>>>> > > time
>>>> > > > > next
>>>> > > > > >> >> week.
>>>> > > > > >> >> >>>>> > we can start
with prototyping and benchmarking
>>>> > something
>>>> > > > > really
>>>> > > > > >> >> >>>>> > simple, such
as parallelDo().
>>>> > > > > >> >> >>>>> >
>>>> > > > > >> >> >>>>> > My interim
goal is to perhaps take some more or
>>>> less
>>>> > > simple
>>>> > > > > >> >> algorithm
>>>> > > > > >> >> >>>>> > from Mahout
and demonstrate it can be solved with
>>>> > Rcrunch
>>>> > > > (or
>>>> > > > > >> >> whatever
>>>> > > > > >> >> >>>>> > name it has
to be) in a comparable time
>>>> (performance)
>>>> > but
>>>> > > > > with
>>>> > > > > >> much
>>>> > > > > >> >> >>>>> > fewer lines
of code. (say one of factorization or
>>>> > > > clustering
>>>> > > > > >> >> things)
>>>> > > > > >> >> >>>>> >
>>>> > > > > >> >> >>>>> >
>>>> > > > > >> >> >>>>> > On Wed, Oct
17, 2012 at 10:24 PM, Rahul <
>>>> > > rsharma@xebia.com
>>>> > > > >
>>>> > > > > >> wrote:
>>>> > > > > >> >> >>>>> >> I am
not much of R user but I am interested to
>>>> see how
>>>> > > > well
>>>> > > > > we
>>>> > > > > >> can
>>>> > > > > >> >> >>>>> integrate
>>>> > > > > >> >> >>>>> >> the two.
I would be happy to help.
>>>> > > > > >> >> >>>>> >>
>>>> > > > > >> >> >>>>> >> regards,
>>>> > > > > >> >> >>>>> >> Rahul
>>>> > > > > >> >> >>>>> >>
>>>> > > > > >> >> >>>>> >> On 18-10-2012
04:04, Josh Wills wrote:
>>>> > > > > >> >> >>>>> >>>
>>>> > > > > >> >> >>>>> >>> On
Wed, Oct 17, 2012 at 3:07 PM, Dmitriy
>>>> Lyubimov <
>>>> > > > > >> >> dlieu.7@gmail.com>
>>>> > > > > >> >> >>>>> >>> wrote:
>>>> > > > > >> >> >>>>> >>>>
>>>> > > > > >> >> >>>>> >>>>
Yep, ok.
>>>> > > > > >> >> >>>>> >>>>
>>>> > > > > >> >> >>>>> >>>>
I imagine it has to be an R module so I can set
>>>> up a
>>>> > > > maven
>>>> > > > > >> >> project
>>>> > > > > >> >> >>>>> >>>>
with java/R code tree (I have been doing that a
>>>> lot
>>>> > > > > lately).
>>>> > > > > >> Or
>>>> > > > > >> >> if you
>>>> > > > > >> >> >>>>> >>>>
have a template to look at, it would be useful i
>>>> > guess
>>>> > > > > too.
>>>> > > > > >> >> >>>>> >>>
>>>> > > > > >> >> >>>>> >>> No,
please go right ahead.
>>>> > > > > >> >> >>>>> >>>
>>>> > > > > >> >> >>>>> >>>>
>>>> > > > > >> >> >>>>> >>>>
On Wed, Oct 17, 2012 at 3:02 PM, Josh Wills <
>>>> > > > > >> >> josh.wills@gmail.com>
>>>> > > > > >> >> >>>>> wrote:
>>>> > > > > >> >> >>>>> >>>>>
>>>> > > > > >> >> >>>>> >>>>>
I'd like it to be separate at first, but I am
>>>> happy
>>>> > > to
>>>> > > > > help.
>>>> > > > > >> >> Github
>>>> > > > > >> >> >>>>> >>>>>
repo?
>>>> > > > > >> >> >>>>> >>>>>
On Oct 17, 2012 2:57 PM, "Dmitriy Lyubimov" <
>>>> > > > > >> dlieu.7@gmail.com
>>>> > > > > >> >> >
>>>> > > > > >> >> >>>>> wrote:
>>>> > > > > >> >> >>>>> >>>>>
>>>> > > > > >> >> >>>>> >>>>>>
Ok maybe there's a benefit to try a JRI/RJava
>>>> > > > prototype
>>>> > > > > on
>>>> > > > > >> >> top of
>>>> > > > > >> >> >>>>> >>>>>>
Crunch for something simple. This should both
>>>> save
>>>> > > > time
>>>> > > > > and
>>>> > > > > >> >> prove or
>>>> > > > > >> >> >>>>> >>>>>>
disprove if Crunch via RJava integration is
>>>> > viable.
>>>> > > > > >> >> >>>>> >>>>>>
>>>> > > > > >> >> >>>>> >>>>>>
On my part i can try to do it within Crunch
>>>> > > framework
>>>> > > > > or we
>>>> > > > > >> >> can keep
>>>> > > > > >> >> >>>>> >>>>>>
it completely separate.
>>>> > > > > >> >> >>>>> >>>>>>
>>>> > > > > >> >> >>>>> >>>>>>
-d
>>>> > > > > >> >> >>>>> >>>>>>
>>>> > > > > >> >> >>>>> >>>>>>
On Wed, Oct 17, 2012 at 2:08 PM, Josh Wills <
>>>> > > > > >> >> jwills@cloudera.com>
>>>> > > > > >> >> >>>>> >>>>>>
wrote:
>>>> > > > > >> >> >>>>> >>>>>>>
>>>> > > > > >> >> >>>>> >>>>>>>
I am an avid R user and would be into it--
>>>> who
>>>> > gave
>>>> > > > the
>>>> > > > > >> >> talk? Was
>>>> > > > > >> >> >>>>> it
>>>> > > > > >> >> >>>>> >>>>>>>
Murray Stokely?
>>>> > > > > >> >> >>>>> >>>>>>>
>>>> > > > > >> >> >>>>> >>>>>>>
On Wed, Oct 17, 2012 at 2:05 PM, Dmitriy
>>>> > Lyubimov <
>>>> > > > > >> >> >>>>> dlieu.7@gmail.com>
>>>> > > > > >> >> >>>>> >>>>>>
>>>> > > > > >> >> >>>>> >>>>>>
wrote:
>>>> > > > > >> >> >>>>> >>>>>>>>
>>>> > > > > >> >> >>>>> >>>>>>>>
Hello,
>>>> > > > > >> >> >>>>> >>>>>>>>
>>>> > > > > >> >> >>>>> >>>>>>>>
I was pretty excited to learn of Google's
>>>> > > experience
>>>> > > > > of R
>>>> > > > > >> >> mapping
>>>> > > > > >> >> >>>>> of
>>>> > > > > >> >> >>>>> >>>>>>>>
flume java on one of recent BARUGs. I think
>>>> a
>>>> > lot
>>>> > > of
>>>> > > > > >> >> applications
>>>> > > > > >> >> >>>>> >>>>>>>>
similar to what we do in Mahout could be
>>>> > > prototyped
>>>> > > > > using
>>>> > > > > >> >> flume R.
>>>> > > > > >> >> >>>>> >>>>>>>>
>>>> > > > > >> >> >>>>> >>>>>>>>
I did not quite get the details of Google
>>>> > > > > implementation
>>>> > > > > >> of
>>>> > > > > >> >> R
>>>> > > > > >> >> >>>>> >>>>>>>>
mapping,
>>>> > > > > >> >> >>>>> >>>>>>>>
but i am not sure if just a direct mapping
>>>> from
>>>> > R
>>>> > > to
>>>> > > > > >> Crunch
>>>> > > > > >> >> would
>>>> > > > > >> >> >>>>> be
>>>> > > > > >> >> >>>>> >>>>>>>>
sufficient (and, for most part, efficient).
>>>> > > > RJava/JRI
>>>> > > > > and
>>>> > > > > >> >> jni
>>>> > > > > >> >> >>>>> seem to
>>>> > > > > >> >> >>>>> >>>>>>>>
be a pretty terrible performer to do that
>>>> > > directly.
>>>> > > > > >> >> >>>>> >>>>>>>>
>>>> > > > > >> >> >>>>> >>>>>>>>
>>>> > > > > >> >> >>>>> >>>>>>>>
on top of it, I am thinknig if this project
>>>> > could
>>>> > > > > have a
>>>> > > > > >> >> >>>>> contributed
>>>> > > > > >> >> >>>>> >>>>>>>>
adapter to Mahout's distributed matrices,
>>>> that
>>>> > > would
>>>> > > > > be
>>>> > > > > >> >> just a
>>>> > > > > >> >> >>>>> very
>>>> > > > > >> >> >>>>> >>>>>>>>
good synergy.
>>>> > > > > >> >> >>>>> >>>>>>>>
>>>> > > > > >> >> >>>>> >>>>>>>>
Is there anyone interested in
>>>> > > contributing/advising
>>>> > > > > for
>>>> > > > > >> open
>>>> > > > > >> >> >>>>> source
>>>> > > > > >> >> >>>>> >>>>>>>>
version of flume R support? Just gauging
>>>> > interest,
>>>> > > > > Crunch
>>>> > > > > >> >> list
>>>> > > > > >> >> >>>>> seems
>>>> > > > > >> >> >>>>> >>>>>>>>
like a natural place to poke.
>>>> > > > > >> >> >>>>> >>>>>>>>
>>>> > > > > >> >> >>>>> >>>>>>>>
Thanks .
>>>> > > > > >> >> >>>>> >>>>>>>>
>>>> > > > > >> >> >>>>> >>>>>>>>
-Dmitriy
>>>> > > > > >> >> >>>>> >>>>>>>
>>>> > > > > >> >> >>>>> >>>>>>>
>>>> > > > > >> >> >>>>> >>>>>>>
>>>> > > > > >> >> >>>>> >>>>>>>
--
>>>> > > > > >> >> >>>>> >>>>>>>
Director of Data Science
>>>> > > > > >> >> >>>>> >>>>>>>
Cloudera
>>>> > > > > >> >> >>>>> >>>>>>>
Twitter: @josh_wills
>>>> > > > > >> >> >>>>> >>>
>>>> > > > > >> >> >>>>> >>>
>>>> > > > > >> >> >>>>> >>>
>>>> > > > > >> >> >>>>> >>
>>>> > > > > >> >> >>>>>
>>>> > > > > >> >> >>>>
>>>> > > > > >> >> >>>>
>>>> > > > > >> >> >>>>
>>>> > > > > >> >> >>>> --
>>>> > > > > >> >> >>>> Director of Data Science
>>>> > > > > >> >> >>>> Cloudera <http://www.cloudera.com>
>>>> > > > > >> >> >>>> Twitter: @josh_wills
<http://twitter.com/josh_wills>
>>>> > > > > >> >>
>>>> > > > > >> >
>>>> > > > > >> >
>>>> > > > > >> >
>>>> > > > > >> > --
>>>> > > > > >> > Director of Data Science
>>>> > > > > >> > Cloudera <http://www.cloudera.com>
>>>> > > > > >> > Twitter: @josh_wills <http://twitter.com/josh_wills>
>>>> > > > > >>
>>>> > > > > >
>>>> > > > > >
>>>> > > > > >
>>>> > > > > > --
>>>> > > > > > Director of Data Science
>>>> > > > > > Cloudera <http://www.cloudera.com>
>>>> > > > > > Twitter: @josh_wills <http://twitter.com/josh_wills>
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Director of Data Science
>>>> Cloudera <http://www.cloudera.com>
>>>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>>>
>>>
>>>
>>



-- 
Director of Data Science
Cloudera
Twitter: @josh_wills

Mime
View raw message