mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Regarding Google Summer of Code Lucene Mahout Project
Date Mon, 24 Mar 2008 17:28:01 GMT

On Mar 24, 2008, at 11:07 AM, Robin Anil wrote:

> Hi Admins,
>                I went through the Google Summer of Code Wiki and  
> found out
> about  the mahout-machine-learning project. I wish to participate in
> implementing the papers. I am currently working on my Btech Thesis  
> which is
> to extract opinionated Sentences from Blogs which is also a part of  
> Text
> Retrieval Conference TREC 2008  Blog
> Track<

> >under
> the guidance of Prof.
> Sudeshna Sarkar <>. For
> implementing of my Trec System, I have experimented with  
> Classifiers( NB,
> SVM, Decision Trees) and Clustering Algorithms( k-means, and Gaussian
> Mixtures). For the project i had used C# version of Lucene  
> (Lucene.NET) to
> index and Retrieve Documents in the Blog06
> Collection< 
> blog06info.html>(160GB).
> I believe working on this project would aid me to further improve the
> performance and the efficiency of the system i am working on as well  
> as ease
> me in working with the open source community.
> I am a 4th year CS Student of IIT Kharagpur working towards a Dual  
> Degree (
> B.Tech + M.Tech). And this would be the first time working with an
> Open-Source project. Could you suggest me the things I should get
> comfortable with in implementing this as well as the detail you  
> require in
> the proposal for implementation

I'd have a look at the wiki and the NIPS paper listed there, and also  
search the archives for GSOC discussions.  I'd also start looking into  
Hadoop and the existing code we have.  Then, just go ahead and make a  
proposal.  I'm particularly interested in classifiers, but I know  
there is a good deal of interest in clustering too (we already have a  
k-means impl).  For classifiers, I am slowly, but surely, working on a  
naive bayes implementation (time is always a question for me), thus,  
implementing decision trees or SVM would be really cool.


View raw message