spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sang Venkatraman <sang.venkatra...@gmail.com>
Subject Re: Any plans for new clustering algorithms?
Date Mon, 21 Apr 2014 16:32:30 GMT
Hi,

On a related note, I have not looked at the the MLlib library in detail but
are there plans on reusing or porting over parts of apache mahout.

Thanks,
Sang


On Mon, Apr 21, 2014 at 12:07 PM, Evan R. Sparks <evan.sparks@gmail.com>wrote:

> While DBSCAN and others would be welcome contributions, I couldn't agree
> more with Sean.
>
>
>
>
> On Mon, Apr 21, 2014 at 8:58 AM, Sean Owen <sowen@cloudera.com> wrote:
>
> > Nobody asked me, and this is a comment on a broader question, not this
> > one, but:
> >
> > In light of a number of recent items about adding more algorithms,
> > I'll say that I personally think an explosion of algorithms should
> > come after the MLlib "core" is more fully baked. I'm thinking of
> > finishing out the changes to vectors and matrices, for example. Things
> > are going to change significantly in the short term as people use the
> > algorithms and see how well the abstractions do or don't work. I've
> > seen another similar project suffer mightily from too many algorithms
> > too early, so maybe I'm just paranoid.
> >
> > Anyway, long-term, I think lots of good algorithms is a right and
> > proper goal for MLlib, myself. Consistent approaches, representations
> > and APIs will make or break MLlib much more than having or not having
> > a particular algorithm. With the plumbing in place, writing the algo
> > is the fun easy part.
> > --
> > Sean Owen | Director, Data Science | London
> >
> >
> > On Mon, Apr 21, 2014 at 4:39 PM, Aliaksei Litouka
> > <aliaksei.litouka@gmail.com> wrote:
> > > Hi, Spark developers.
> > > Are there any plans for implementing new clustering algorithms in
> MLLib?
> > As
> > > far as I understand, current version of Spark ships with only one
> > > clustering algorithm - K-Means. I want to contribute to Spark and I'm
> > > thinking of adding more clustering algorithms - maybe
> > > DBSCAN<http://en.wikipedia.org/wiki/DBSCAN>.
> > > I can start working on it. Does anyone want to join me?
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message