lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Searching for similar documents
Date Sat, 16 Jul 2005 07:30:30 GMT
We've got this in Lucene's contrib/:

$ ll contrib/similarity/src/java/org/apache/lucene/search/similar/*java

-rwxrwxr-x  1 otis otis 30431 Jul  9 09:20 MoreLikeThis.java*
-rwxrwxr-x  1 otis otis  3612 Mar 16 17:31 SimilarityQueries.java*

Otis

--- "Kadlabalu, Hareesh" <hareesh.kadlabalu@fatwire.com> wrote:

> Hi, 
> I am trying to build a search utility that looks for 'similarities'
> between
> documents.
> In other words, for every document listed as a part of search result
> for a
> phrase, I want to be able to list documents that are similar to it
> (but not
> necessarily match the same search criterion). For example, if my
> search for
> "Tomcat" returned "Tomcat installation guide", I want to write a
> utility
> that looks for all similar installation guides that may or may not be
> related to Tomcat.
> 
> One approach I am thinking is to use term vectors. Algorithm: first
> extract
> the top X term vectors from the current document and create a Boolean
> query
> for those terms. Run it against contents of other documents (I will
> probably
> have to remove commonly used terms manually?). Resulting documents
> should be
> similar to the original one. 
> 
> I am wondering if something like this already exists or someone has a
> better
> algorithm/solution. Or am I headed off in the wrong direction with
> this
> algorithm? Your advice is highly appreciated. 
> 
> Thanks
> -Hareesh 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message