mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Turian <tur...@gmail.com>
Subject Mahout to find semantically related terms over a large vocabulary (>1M)?
Date Sat, 06 Nov 2010 02:11:38 GMT
I'm organizing a bakeoff, if you want to show off some Mahout skills
and do a controlled comparison of Mahout to other people's approaches:

Let's say I have several hundred million documents, which are very
short (only a few words). There are several million terms in the
vocabulary. What is the fastest way to find the top-k semantically
related terms for each term in the vocabulary?

If you just want to hear the results, join this group:
http://groups.google.com/group/metaoptimize-challenge-announce

If you actually want to hack some data, read this blog post:
http://metaoptimize.com/blog/2010/11/05/nlp-challenge-find-semantically-related-terms-over-a-large-vocabulary-1m/

It would be really cool to see participation from the Mahout community
in a Mahout demo, to get a controlled comparison to other
implementations.

Best,
  Joseph

Mime
View raw message