mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Eastman" <>
Subject RE: MapReduce, machine learning, and introductions
Date Fri, 04 Apr 2008 00:04:52 GMT
Hi Gary,


Thanks for your suggestion on Random Forests. I've cc'd this thread to the
Mahout dev list just in case you would like to continue it there. We have
received a lot of interest from students in conjunction with the Google
Summer of Code project and others looking to contribute to our mission. We
are not restricted at all to the 10 original NIPS algorithms; they were just
a natural starting point and a way to "prime the pump". Perhaps some more
information on your experiences using it on real manufacturing data would
motivate an implementation. 





From: Gary Bradski [] 
Sent: Thursday, April 03, 2008 4:46 PM
To: Jeff Eastman
Cc: Andrew Y. Ng; Dubey, Pradeep; Jimmy Lin
Subject: Re: MapReduce, machine learning, and introductions


One of the things I'd like to see parallelized is Random forests.  Though
there is no "best" algorithm for classification, when I ran it on Intel
manufacturing data sets it was almost always beating boosting, SVM, and
MART. Zisserman claimed it worked best on keypoint recognition in vision and
his version was the simplest one I've heard.

This is one of those "brain dead" parallelizations -- just parcel out the
learning of trees on randomly selected subsets of the data.  In learning,
each tree randomly selects from a subset of the features at each node.

It has nice techniques for doing feature selection as well.


On Thu, Apr 3, 2008 at 4:27 PM, Jeff Eastman <>

Well, it has been a couple of years. Thanks for the response and
retransmission. Good luck in your current endeavors.





From: Gary Bradski [] 
Sent: Thursday, April 03, 2008 4:23 PM
To: Andrew Y. Ng; Dubey, Pradeep
Cc: Jeff Eastman; Jimmy Lin
Subject: Re: MapReduce, machine learning, and introductions


Re: Parallel Machine learning project mahout

When I was at Intel, I began carving out a parallel Machine learning niche
since it was something interesting that Intel would also be interested in.

But that was two companies ago for me and I haven't touched it since.  I'm
now focused on sensor guided manipulation and revamping the computer vision
library I started, OpenCV.  

About all I can do is send the last known working version of the code that I
had.  I've CC'd Pradeep Dubey, and Intel Fellow with whom I worked on some
of the parallel machine learning issues, his team also studied that code.  I
don't know what happened since, but Parallel machine learning might still be
one of his active areas and maybe theres's some synergy there.


On Thu, Apr 3, 2008 at 3:38 PM, Andrew Y. Ng <> wrote:

Hi Jeff,

I'd been hearing increasing amounts of buzz on Mahour and am excited
about it, but unfortunately am no longer working in this space.
Gary Bradski, CC-ed above, would be a great person to talk to about
Map-Reduce and machine learning, though!


On Thu, 3 Apr 2008, Jeff Eastman wrote:

> Hi Andrew,
> I'm a committer on the new Mahout project. As Jimmy indicated, we are
> setting out to implement versions of the NIPS paper algorithms on top of
> Hadoop. So far, we have committed versions of only k-means and canopy but
> have a number of other algorithms in various stages of implementation. I
> don't have any immediate questions but I live in Los Altos and so it would
> be convenient to visit if you or your colleagues do have questions about
> Mahout.
> In any case I thought it would be nice to introduce myself.
> Jeff
> Jeff Eastman, Ph.D.
> Windward Solutions Inc.
> +1.415.298.0023
> > -----Original Message-----
> > From: Jimmy Lin []
> > Sent: Saturday, March 29, 2008 8:37 PM
> > To:
> > Cc: Jeff Eastman
> > Subject: MapReduce, machine learning, and introductions
> >
> > Hi Andrew,
> >
> > How are things going?  Haven't seen you in a while... hope everything
> > is going well at Stanford.
> >
> > I was recently in the bay area attending the Yahoo Hadoop summit---
> > I've been using MapReduce in teaching and research recently (stat MT,
> > IR, etc.), so I was there talking about that.
> >
> > Are you aware of the Apache Mahout project?  They are putting together
> > an open-source MR toolkit for machine-learning-ish things; one of the
> > things they're working on is implementing the various algorithms in
> > your NIPS paper.  Jeff Eastman is involved in the project, cc'ed
> > here.  I thought I'd put the two of you in touch...
> >
> > Best,
> > Jimmy



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message