mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jakobitsch juergen <>
Subject Re: Mahout to find semantically related terms over a large vocabulary (>1M)?
Date Sun, 07 Nov 2010 10:09:45 GMT
hi joseph, 

i'm very much interested in stuff like that, allthough i'm not a 
mahout guru, i'd be very glad to have a working sample, because
i can see very usefull things...

i'm working with large thesauri in skos-format and am sure 
i could use working solutions in a couple of projects.

keep up


----- Original Message ----
From: Joseph Turian <>
Sent: Sat, November 6, 2010 3:11:38 AM
Subject: Mahout to find semantically related terms over a large vocabulary 

I'm organizing a bakeoff, if you want to show off some Mahout skills
and do a controlled comparison of Mahout to other people's approaches:

Let's say I have several hundred million documents, which are very
short (only a few words). There are several million terms in the
vocabulary. What is the fastest way to find the top-k semantically
related terms for each term in the vocabulary?

If you just want to hear the results, join this group:

If you actually want to hack some data, read this blog post:

It would be really cool to see participation from the Mahout community
in a Mahout demo, to get a controlled comparison to other



View raw message