mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Said <Alan.S...@dai-labor.de>
Subject RE: Moving a twitter conversation to the mailing list
Date Mon, 08 Nov 2010 13:10:50 GMT
As Sebastian mentions I'm going to try to make a scalable implementation. Being a Hadoop/Mahout
newbie however I'm not really sure how difficult this might end up being.

I intend to do a very general implementation which could be used for (Hy)PLSA as described
here: http://www.dai-labor.de/en/publication/403

/Alan
-- 
***************************************
M.Sc.(Eng.) Alan Said
Compentence Center Information Retrieval & Machine Learning 
Technische Universit├Ąt Berlin / DAI-Lab 
Sekr. TEL 14 Ernst-Reuter-Platz 7
10587 Berlin / Germany
Phone:  0049 - 30 - 314 74072
Fax:    0049 - 30 - 314 74003
E-mail: alan.said@dai-lab.de
http://www.dai-labor.de
***************************************

-----Original Message-----
From: Sebastian Schelter [mailto:ssc@apache.org] 
Sent: Monday, November 08, 2010 1:01 PM
To: user@mahout.apache.org
Subject: Moving a twitter conversation to the mailing list

I'm moving a twitter conversation to the mailing list so that it doesn't 
vanish in the short-lived microblogging sphere.

To summarize, @alansaid is looking for an implementation of the 
EM-algorithm as described here: 
https://cwiki.apache.org/confluence/display/MAHOUT/Expectation+Maximization. 
I could only point him to an unsuccessful implementation of PLSI tried 
at https://issues.apache.org/jira/browse/MAHOUT-106. While this one 
worked for tiny examples, it clearly didn't scale and it had some parts 
of the algorithm wrong IMHO. @sbourke tweeted about using it besides 
scalability issues but I would clearly discourage anyone from doing this.

Nevertheless if Alan manages to make this work and scale I think it 
would make a very nice contribution to Mahout. I guess we'd be willing 
to help, so Alan, if you need support, just ask on dev@. There's also a 
mahout hackathon planned in Berlin, maybe that would be a good 
opportunity work collaboratively on that implementation.

--sebastian

Mime
View raw message