lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject RE: Searching for similar documents
Date Sat, 16 Jul 2005 14:52:20 GMT
They should work with the version in SVN, as well as with 1.4.3.

Otis


--- "Kadlabalu, Hareesh" <hareesh.kadlabalu@fatwire.com> wrote:

> Thanks Otis!
> 
> Are there Lucene 1.4 compatiable version of these classes?
> 
> Thanks very much
> -Hareesh
> 
> -----Original Message-----
> From: Otis Gospodnetic
> To: java-user@lucene.apache.org
> Sent: 7/16/2005 3:30 AM
> Subject: Re: Searching for similar documents
> 
> We've got this in Lucene's contrib/:
> 
> $ ll
> contrib/similarity/src/java/org/apache/lucene/search/similar/*java
> 
> -rwxrwxr-x  1 otis otis 30431 Jul  9 09:20 MoreLikeThis.java*
> -rwxrwxr-x  1 otis otis  3612 Mar 16 17:31 SimilarityQueries.java*
> 
> Otis
> 
> --- "Kadlabalu, Hareesh" <hareesh.kadlabalu@fatwire.com> wrote:
> 
> > Hi, 
> > I am trying to build a search utility that looks for 'similarities'
> > between
> > documents.
> > In other words, for every document listed as a part of search
> result
> > for a
> > phrase, I want to be able to list documents that are similar to it
> > (but not
> > necessarily match the same search criterion). For example, if my
> > search for
> > "Tomcat" returned "Tomcat installation guide", I want to write a
> > utility
> > that looks for all similar installation guides that may or may not
> be
> > related to Tomcat.
> > 
> > One approach I am thinking is to use term vectors. Algorithm: first
> > extract
> > the top X term vectors from the current document and create a
> Boolean
> > query
> > for those terms. Run it against contents of other documents (I will
> > probably
> > have to remove commonly used terms manually?). Resulting documents
> > should be
> > similar to the original one. 
> > 
> > I am wondering if something like this already exists or someone has
> a
> > better
> > algorithm/solution. Or am I headed off in the wrong direction with
> > this
> > algorithm? Your advice is highly appreciated. 
> > 
> > Thanks
> > -Hareesh 
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message