mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From deneche abdelhakim <a_dene...@yahoo.fr>
Subject Re: GSOC Mahout.GA, next steps ?
Date Mon, 09 Jun 2008 10:14:32 GMT
I found a cool introduction to evolutionary algorithms, I added it to the wiki if someone is
interested...


--- En date de : Mer 28.5.08, Grant Ingersoll <gsingers@apache.org> a écrit :

> De: Grant Ingersoll <gsingers@apache.org>
> Objet: Re: GSOC Mahout.GA, next steps ?
> À: mahout-dev@lucene.apache.org
> Date: Mercredi 28 Mai 2008, 13h11
> This sounds good.  I don't know a lot about GAs, so if
> others have  
> insight, that would be great.  It would also be handy if
> you could put  
> up a section on the Wiki about GAs and maybe post some
> links to basic  
> papers there, so people that aren't familiar can go do
> some background  
> reading.
> 
> I will try to get to MAHOUT-56 this week, but others can
> jump in and  
> review as well.
> 
> -Grant
> 
> On May 27, 2008, at 4:52 AM, deneche abdelhakim wrote:
> 
> > In a GA there are many things that can be distributed,
> and one  
> > should always start with the most compute demanding
> task . This is  
> > very problem dependent, but in most cases the fitness
> evaluation  
> > function (FEF) "is" the part to distribute.
> >
> > The FEF evaluates each single individual in the
> population, and it  
> > may need some datas (D) to do so. For example in the
> traveling  
> > Salesman Problem, the problem is defined by a set of
> cities and the  
> > distances between them, the FEF needs those distances
> to evaluate  
> > the individuals.
> >
> > I see 2 ways to distribute the FEF:
> >
> > A. if the datas D is not big and can fit in each
> single cluster  
> > node, then the easiest solution is to use each Mapper
> to evaluate  
> > one individual and to pass the Datas D to all the
> mappers (using  
> > some Job parameter or the DistributedCache). The input
> of the job is  
> > the population of individuals. For someone used to
> work with  
> > Watchmaker, the solution A is straightforward, he
> needs to change  
> > one line of code.
> >
> > B. if the datas D are really big and span over
> multiple nodes, then  
> > the FEF should be writen in the form of
> Mappers-Reducers, the  
> > population of individuals is passed to all the mappers
> (again using  
> > the DistributedCache or a Job parameter) and the datas
> D are now the  
> > input of the Job.
> >
> > [MAHOUT-56] contains a possible implementation for
> solution A. Now I  
> > should start thinking about solution B and all I need
> is a problem  
> > that uses very big datasets. I already proposed one in
> my GSoC  
> > proposal, it consists of using a Genetic Algorithm to
> find good  
> > binary classification rule for a given dataset. But I
> am open to any  
> > other suggestion.
> >
> > __________________________________________________
> > Do You Yahoo!?
> > En finir avec le spam? Yahoo! Mail vous offre la
> meilleure  
> > protection possible contre les messages non
> sollicités
> > http://mail.yahoo.fr Yahoo! Mail


      _____________________________________________________________________________ 
Envoyez avec Yahoo! Mail. Une boite mail plus intelligente http://mail.yahoo.fr

Mime
View raw message