mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommy Chheng <to...@peoplejar.com>
Subject Re: mahout PLSI (with some lucene, thrown in)
Date Tue, 23 Jun 2009 22:19:25 GMT
Have you looked at WordNet to get the hypohyms?

Tommy

On Jun 23, 2009, at 3:09 PM, Paul Jones wrote:

> Okay, have seen the difficulty (apart from the maths :-)).
>
> I guess "similar" can mean many things, i.e hypohyms, but also words  
> such as hot...cold are also "related", hence to solve my little  
> problem I am wondering if there is a easier way, i.e to use things  
> like existing hyponyms relations which exist (wordnet and the  
> like) , and/or if they do not then I guess using something similar  
> to a "google distance measure" may help in "adding" new words to the  
> system....
>
> Paul
>
>
>
>
> ________________________________
> From: Ted Dunning <ted.dunning@gmail.com>
> To: mahout-user@lucene.apache.org
> Sent: Tuesday, 23 June, 2009 18:00:12
> Subject: Re: mahout PLSI (with some lucene, thrown in)
>
> Yes.  This can be done.  It isn't necessarily real simple to do.
>
> See http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.7275  
> for an
> old (but still pretty good) example.
>
> On Tue, Jun 23, 2009 at 6:45 AM, Paul Jones  
> <paul_jonez99@yahoo.co.uk>wrote:
>
>> Imagine we have crawled 100K webpages, and we have 100 pages which  
>> show
>> "red" and 100 which show "crimson" and then 100 which show both  
>> "red and
>> crimson" is there a way to deduce that there maybe (albeit weak)
>> relationship between red AND crimson. Of course we can pre-seed  
>> this info,
>> which then gets weighted by actual results.
>>
>
>
>


Mime
View raw message