mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Jones <paul_jone...@yahoo.co.uk>
Subject Re: mahout PLSI (with some lucene, thrown in)
Date Tue, 23 Jun 2009 22:09:11 GMT
Okay, have seen the difficulty (apart from the maths :-)). 

I guess "similar" can mean many things, i.e hypohyms, but also words such as hot...cold are
also "related", hence to solve my little problem I am wondering if there is a easier way,
i.e to use things like existing hyponyms relations which exist (wordnet and the like) , and/or
if they do not then I guess using something similar to a "google distance measure" may help
in "adding" new words to the system....

Paul




________________________________
From: Ted Dunning <ted.dunning@gmail.com>
To: mahout-user@lucene.apache.org
Sent: Tuesday, 23 June, 2009 18:00:12
Subject: Re: mahout PLSI (with some lucene, thrown in)

Yes.  This can be done.  It isn't necessarily real simple to do.

See http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.7275 for an
old (but still pretty good) example.

On Tue, Jun 23, 2009 at 6:45 AM, Paul Jones <paul_jonez99@yahoo.co.uk>wrote:

> Imagine we have crawled 100K webpages, and we have 100 pages which show
> "red" and 100 which show "crimson" and then 100 which show both "red and
> crimson" is there a way to deduce that there maybe (albeit weak)
> relationship between red AND crimson. Of course we can pre-seed this info,
> which then gets weighted by actual results.
>



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message