lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Becker <>
Subject Re: Similar Document Search
Date Tue, 19 Aug 2003 01:05:39 GMT
Hi Terry,

we have been thinking about the same problem and in the end we decided 
that most likely the only good solution to this is to keep a 
non-inverted index, i.e. a map from the documents to the terms. Then you 
can query the most terms for the documents and query other documents 
matching parts of this (where you get the usual question of what is 
actually interesting: high frequency, low frequency or the mid range).

Indexing would probably be quite expensive since Lucene doesn't seem to 
support changes in the index, and the index for the terms would change 
all the time. We haven't implemented it yet, but it shouldn't be hard to 
code. I just wouldn't expect good performance when indexing large 


Terry Steichen wrote:

>Is it possible without extensive additional coding to use Lucene to conduct a search based
on a document rather than a query?  (One use of this would be to refine a search by selecting
one of the hits returned from the initial query and subsequently retrieving other documents
"like" the selected one.)

View raw message