mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: 0xdata interested in contributing
Date Sun, 16 Mar 2014 16:17:22 GMT
So your Mahout DRM work was targeted for production at your company and was working well but
other parts of the project fell through and it didn’t get deployed. Some of it is almost
a year old and pretty mature.  

--This is very good news.

You are also saying that the integration model you used for Spark would probably mostly work
for other solver frameworks like Stratosphere but it doesn’t look appropriate for h2o. 

—Good to know

Your last point is that speed is not so much a deciding factor as other less tangible things.
Your example is R which has 5000 packages and counting but is notoriously slow. By that I
assume you are saying a speed comparison is not nearly as important as other factors, most
of which have to do with attracting the largest community of users and contributors.

—Here we agree for sure. Getting a faster regression or random forest implementation (as
long as it takes Mahout formats as input) is great. But if it implies that committers move
to the platform (h2o) used in these implementations then someone must make a case for why
it’s in the roadmap. 

On Mar 14, 2014, at 3:55 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

Pat, sorry for offtop -- this code is actually about a year old at heart. I
was using it to run some custom methods back in my company but I had to
largely reshape it to fit Mahout once i got a permission to contribute. So
this took a while, but the idea is certainly not new. At least parts of
this code (e.g. drm serialization) used to run something real at some
point. Actually initial materialization of this code predates MLI talks
that i was referring to (at least when i first heard of MLI). Unfortunately
our experiments with big data solvers currently nowhere close to production
due to product priorities -- so that was in part why i said, well, let's at
least make it public if we don't use it.

But you can potentially develop this idea to further optimize and support
basic data frame operators as well, all while independent of the back.
Unfortunately, the back has to pass certain programming model maturity
test, right now that would be Spark, Stratosphere and other Flume-java-like
models, but i don't think 0xdata in particular, as it stands, passes it.

Another thing is (also used at our office) you can simply write it as a
driver-script and run in a scala shell akin to R.

The next step would be fire up developers to wright algorithms, I think R
is closing now on about 5,000 packages. I probably will not miss the truth
here by much by saying this is exactly because of it being ML environment
(and certainly not because of its performance -- R is notoriously slow).




On Fri, Mar 14, 2014 at 3:39 PM, Pat Ferrel <pat@occamsmachete.com> wrote:

> Cool, I'm super excited to see RSJ on Spark integrated into the mainline
> with Dimitriy's  work. I really really hope that it is seen as important
> and doesn't get stalled by committers being demotivated. I had no idea that
> what I consider the heart of Mahout was so close to being real on Spark.
> 
> I'm also happy to hear that you are full speed ahead for this Spark work.
> I obviously got the wrong impression.
> 
> As to "new contributors who have some interesting capabilities" great, as
> long as it doesn't end up defocusing people. Old committers are naturally
> going to wonder where to put their efforts with this proposal. Some may
> just give up until the dust settles. I'm sure we can agree that that would
> not be good.
> 
> The question of roadmap is, more than ever, up for discussion. I would
> just plead one last time that Spark work not be stalled while this is
> worked out.
> 
> On Mar 14, 2014, at 1:00 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> 
> 
> Pat
> 
> I am not suggesting that we walk away from anything.
> 
> I am suggesting that we welcome new contributors who have some interesting
> capabilities.
> 
> I also suggest that those efforts should be made to work well with
> existing efforts.
> 
> Sent from my iPhone
> 
>> On Mar 14, 2014, at 10:58, Pat Ferrel <pat@occamsmachete.com> wrote:
>> 
>> I think people (including me) have underestimated how much you and
> Sebastian have done on Spark. Realistically it sounds like we are talking
> about walking away from that in favor of an unknown.
>> 
>> 0xdata's community has not been solving the problems I care about. You
> guys have.
> 
> 


Mime
View raw message