mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Isabel Drost <>
Subject Mahout interest at Berlin universities
Date Sun, 05 Jul 2009 19:37:28 GMT


I just wanted to let you know that during the last few months I was invited by 
several (machine learning/ information retrieval/ database) research groups 
here in Berlin to tell them more on Mahout and give a brief overview of 

Usually I gave two example applications, explained the main motivation for 
Mahout, introduced Hadoop at a very high level, showed some strategies for 
coming up with parallel solutions. After that I included an overview of 
existing implementations in Mahout, gave an overview of why and how 
participation is possible.

My impression from those talks was that people are pretty interested in what 
is going on here. Some have setup their own Hadoop cluster and run 
experiments on it. Some are planning to do so in the near future. A few even 
expressed interest in contributing to the project.

There were a few common reactions/ observations that I would like to share 
with you - comments, corrections, additions very welcome: 

People seem to slowly become aware that there is something named Hadoop that 
implements a framework for parallel programming once developed at Google. 
However the basic assumptions and implications (e.g. data locality) are known 
only by few groups/ people at least in the IR and data mining domains.

Anytime I asked people using Apache software as to whether they are subscribed 
to the corresponding user mailinglist the answer was a questioning face and 
no as an answer. I tried to make clear why participation is important - I 
guess we will see in the near future whether I was successful ;)

I was surprised to see people only vaguely aware of the GSoC program. They 
knew that it does exist, but the general setup was not as widely known as I 
would have expected it to be. After all in our GSoC proposals there seemed to 
be quite a few students co-supervised by their university.

Concerning Mahout I got varying feedback: There were a few that had a look at 
it last autumn that found it difficult to find the sourcecode and 
documentation. Some students had a look shortly after Apache Con EU this year 
and found it hard to setup a demo application. I think having some JavaDoc, 
tutorial, setup sort of documentation for each release version on our website 
might help people getting started easier?

Other than that general feedback seemed to be that we are doing "surprisingly 
well" both in terms of emerging community and in terms of implementation 
progress over the first year. 

Last but not least: From DIMA at TU Berlin I received the offer to do 
a "Mahout seminar". It would consist of two parts: A theoretical one where 
students read scientific publications, prepare a survey and give a talk by 
the end of the semester. The other part would be a project where they could 
work for instance on some algorithm implementation or integrate already 
existing implementations in a project. Goal would be to strengthen their 
programming and project management skills and along the way make them 
contribute back to the community.

My first thought was to prepare a task with the goal of building a new 
blog "search engine". They could build a system that identifies clusters of 
blogs on a common topic, work on the link graph in the blogosphere, detect 
new emerging topics and the like. Before preparing the final seminar 
proposal, I would like to ask you whether there is anything you might want 
those students to work on during their winter-term.

Sorry for the overly longish e-mail...


  |\      _,,,---,,_       Web:   <>
  /,`.-'`'    -.  ;-;;,_  
 |,4-  ) )-,_..;\ (  `'-' 
'---''(_/--'  `-'\_) (fL)  IM:  <xmpp://>

View raw message