lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gregor Heinrich" <Gregor.Heinr...@igd.fhg.de>
Subject RE: Similar Document Search
Date Wed, 20 Aug 2003 14:47:05 GMT
Hello Terry,

Lucene can do forward indexing, as Mark Rosen outlines in his Master's
thesis: http://citeseer.nj.nec.com/rosen03email.html.

We use a similar approach for (probabilistic) latent semantic analysis and
vector space searches. However, the solution is not really completely fixed
yet, therefore no code at this time...

Best regards,

Gregor




-----Original Message-----
From: Peter Becker [mailto:pbecker@dstc.edu.au]
Sent: Tuesday, August 19, 2003 3:06 AM
To: Lucene Users List
Subject: Re: Similar Document Search


Hi Terry,

we have been thinking about the same problem and in the end we decided
that most likely the only good solution to this is to keep a
non-inverted index, i.e. a map from the documents to the terms. Then you
can query the most terms for the documents and query other documents
matching parts of this (where you get the usual question of what is
actually interesting: high frequency, low frequency or the mid range).

Indexing would probably be quite expensive since Lucene doesn't seem to
support changes in the index, and the index for the terms would change
all the time. We haven't implemented it yet, but it shouldn't be hard to
code. I just wouldn't expect good performance when indexing large
collections.

  Peter


Terry Steichen wrote:

>Is it possible without extensive additional coding to use Lucene to conduct
a search based on a document rather than a query?  (One use of this would be
to refine a search by selecting one of the hits returned from the initial
query and subsequently retrieving other documents "like" the selected one.)
>
>Regards,
>
>Terry
>
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org



Mime
View raw message