mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Goel, Ankur" <>
Subject RE: Hi
Date Tue, 12 Feb 2008 14:16:18 GMT
Hi Greg/Lucas,
              Thanks for your warm welcome! I have a 1 class
implementation of 
Probabilistic Latent Semantic Indexing (PLSI) algorithm that is based on
Expectation Maximization
Technique. The technique featured in Google News Personalization paper

It's a non map-reduce small protoype meant to demonstrate the algorithm
thoery in action as the PLSI concept 
itself is a little complex. 

I believe it can serve as a good enough base for a map-reduce version of
Expectation Maximization algorithm
that we have on our list.

Just need to add some java doc comments so that its more clear (The code
is compact though).
Will create a JIRA issue and submit patch as soon as it is ready. 

Please feel free to file the JIRA issue for this if you like :-)


-----Original Message-----
From: Grant Ingersoll [] 
Sent: Monday, February 11, 2008 6:43 PM
Subject: Re: Hi

Hi Ankur,

Glad Mahout sounds like a good fit for you, as you sound like a good  
fit for us.   We are just getting off the ground, but have put  
together a number of resources, etc. on our Wiki at
  where we have a number of algorithms laid out and some initial goals

As for who is working on what, that is all coordinated (to the extent
that it is coordinated) via this mailing list.  I think, right now,  
people are waiting for something to hang their hat on, so to speak.   
Jeff Eastman has contributed an initial take on Canopy clustering.  As
for me, I am working on a naive bayes classifier, but it is slow going
for me at the moment.  We are also still waiting for a couple of other
committers to come on board (getting their paperwork and accounts setup,

So, if you like Collab Filtering, I would say feel free to submit a
patch (see the How To Contribute section on the Wiki), otherwise, if you
want to work on one of the algorithms we have picked out, then feel free
to jump in.  Also feel free to add any knowledge you have to the wiki.
Especially in these early stages, we will have to work to get over the
startup inertia of no code.  I think the best way to do this is to just
get patches up, no matter their state of "doneness"  
and have others start trying them out.  We don't need to be perfect at
this stage, we just need things that people can start trying out and
correcting them.


On Feb 11, 2008, at 7:58 AM, Goel, Ankur wrote:

> Hi Folks,
>         I have a budding interest in the area of machine learning and 
> information retrieval.
> Off late I have been working on collaboration based filtering 
> algorithms that analyse user-click history to learn user-behaviour and

> build associations between users and clicks.
> Just for the sake of info, I am pretty much familiar with Map-Reduce 
> programming model and have been using as the map-reduce framework for 
> my jobs.
> Having a look at Apache Mahout, it seems like just the right place to 
> nurture my interest in the Field. I am looking forward to make active 
> contributions to Mahout.
> Can anyone suggest how do I know who is working on what?
> Thanks
> -Ankur

View raw message