mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Moving a twitter conversation to the mailing list
Date Tue, 09 Nov 2010 11:53:32 GMT
Just realized this morning that I have my acronyms backwards:  They have a Max. Entropy algorithm,
not an EM algorithm.

On Nov 8, 2010, at 9:07 PM, Grant Ingersoll wrote:

> The EM topic is interesting, as OpenNLP is in the process of moving towards Incubation
(http://wiki.apache.org/incubator/OpenNLPProposal) at the ASF and they have an EM implementation.
 I've talked to them about bringing it into Mahout, but they are not interested in the extra
complexity at the moment since it would add a lot of dependencies.  We, however, could do
the heavy lifting by taking it and making it scale, if it is possible.
> 
> -Grant
> 
> 
> On Nov 8, 2010, at 8:01 AM, Sebastian Schelter wrote:
> 
>> I'm moving a twitter conversation to the mailing list so that it doesn't vanish in
the short-lived microblogging sphere.
>> 
>> To summarize, @alansaid is looking for an implementation of the EM-algorithm as described
here: https://cwiki.apache.org/confluence/display/MAHOUT/Expectation+Maximization. I could
only point him to an unsuccessful implementation of PLSI tried at https://issues.apache.org/jira/browse/MAHOUT-106.
While this one worked for tiny examples, it clearly didn't scale and it had some parts of
the algorithm wrong IMHO. @sbourke tweeted about using it besides scalability issues but I
would clearly discourage anyone from doing this.
>> 
>> Nevertheless if Alan manages to make this work and scale I think it would make a
very nice contribution to Mahout. I guess we'd be willing to help, so Alan, if you need support,
just ask on dev@. There's also a mahout hackathon planned in Berlin, maybe that would be a
good opportunity work collaboratively on that implementation.
>> 
>> --sebastian
> 


Mime
View raw message