lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael O'Leary" <mich...@seomoz.org>
Subject Getting a similarity score for an arbitrary pair of documents or a query and a document
Date Wed, 06 Mar 2013 08:14:13 GMT
Is there an api in Lucene for finding the similarity score for two
documents that have been randomly pulled from an index? What about for a
query and a randomly selected document?

I realize this isn't the standard purpose of Lucene, but I was given a task
to compare similarity scores for the Similarity classes defined in Lucene
4.x using a somewhat large predefined set of documents and query strings,
and I am finding that collecting the results by indexing the documents in
separate indexes with each of the Similarity classes, searching using the
query strings, locating the subset of documents in the results that I am
interested in and recording the scores is taking quite a long time.

I am about to look through the Lucene source code to see how the Similarity
classes are used in normal use cases such as search and more-like-this, but
if someone could direct me on where to look, or, even better, knows of an
api function that takes a pair of documents, or a query and a document, and
returns a similarity score for them, I would greatly appreciate it.
Thanks,
Mike

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message