lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Magnus Johansson <>
Subject Re: Similar Document Search
Date Tue, 19 Aug 2003 07:12:26 GMT
Hi Peter

If the original document is available. You could extract keywords from 
the document
at query time. That is when someone asks for documents similar to 
document a. You
re-analyze document a and in combination with statistics from the Lucene 
index you extract
keywords from document a that can then be used as a query for findining 
similar documents.

I've got some sample code if anyone is interested.


Peter Becker wrote:

> Hi Terry,
> we have been thinking about the same problem and in the end we decided 
> that most likely the only good solution to this is to keep a 
> non-inverted index, i.e. a map from the documents to the terms. Then 
> you can query the most terms for the documents and query other 
> documents matching parts of this (where you get the usual question of 
> what is actually interesting: high frequency, low frequency or the mid 
> range).
> Indexing would probably be quite expensive since Lucene doesn't seem 
> to support changes in the index, and the index for the terms would 
> change all the time. We haven't implemented it yet, but it shouldn't 
> be hard to code. I just wouldn't expect good performance when indexing 
> large collections.
>  Peter
> Terry Steichen wrote:
>> Is it possible without extensive additional coding to use Lucene to 
>> conduct a search based on a document rather than a query?  (One use 
>> of this would be to refine a search by selecting one of the hits 
>> returned from the initial query and subsequently retrieving other 
>> documents "like" the selected one.)
>> Regards,
>> Terry
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message