mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Jones <>
Subject Re: mahout PLSI (with some lucene, thrown in)
Date Tue, 23 Jun 2009 22:09:11 GMT
Okay, have seen the difficulty (apart from the maths :-)). 

I guess "similar" can mean many things, i.e hypohyms, but also words such as hot...cold are
also "related", hence to solve my little problem I am wondering if there is a easier way,
i.e to use things like existing hyponyms relations which exist (wordnet and the like) , and/or
if they do not then I guess using something similar to a "google distance measure" may help
in "adding" new words to the system....


From: Ted Dunning <>
Sent: Tuesday, 23 June, 2009 18:00:12
Subject: Re: mahout PLSI (with some lucene, thrown in)

Yes.  This can be done.  It isn't necessarily real simple to do.

See for an
old (but still pretty good) example.

On Tue, Jun 23, 2009 at 6:45 AM, Paul Jones <>wrote:

> Imagine we have crawled 100K webpages, and we have 100 pages which show
> "red" and 100 which show "crimson" and then 100 which show both "red and
> crimson" is there a way to deduce that there maybe (albeit weak)
> relationship between red AND crimson. Of course we can pre-seed this info,
> which then gets weighted by actual results.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message