I'm moving a twitter conversation to the mailing list so that it doesn't
vanish in the short-lived microblogging sphere.
To summarize, @alansaid is looking for an implementation of the
EM-algorithm as described here:
https://cwiki.apache.org/confluence/display/MAHOUT/Expectation+Maximization.
I could only point him to an unsuccessful implementation of PLSI tried
at https://issues.apache.org/jira/browse/MAHOUT-106. While this one
worked for tiny examples, it clearly didn't scale and it had some parts
of the algorithm wrong IMHO. @sbourke tweeted about using it besides
scalability issues but I would clearly discourage anyone from doing this.
Nevertheless if Alan manages to make this work and scale I think it
would make a very nice contribution to Mahout. I guess we'd be willing
to help, so Alan, if you need support, just ask on dev@. There's also a
mahout hackathon planned in Berlin, maybe that would be a good
opportunity work collaboratively on that implementation.
--sebastian
|