lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Espina <espinaemman...@gmail.com>
Subject Re: Getting a similarity score for an arbitrary pair of documents or a query and a document
Date Wed, 06 Mar 2013 14:55:32 GMT
Have you already checked Solr's more like this?
http://wiki.apache.org/solr/MoreLikeThisHandler and
http://wiki.apache.org/solr/MoreLikeThis Your describe a problem similar to
the use case of that component and if there is something to hack is solr's
more like this.

Lucene's similarity is a low level class used by some queries (for example
TermQueries) but I don't think that you need something so low level from
what you describe.

Thanks
Emmanuel


2013/3/6 Michael O'Leary <michael@seomoz.org>

> Is there an api in Lucene for finding the similarity score for two
> documents that have been randomly pulled from an index? What about for a
> query and a randomly selected document?
>
> I realize this isn't the standard purpose of Lucene, but I was given a task
> to compare similarity scores for the Similarity classes defined in Lucene
> 4.x using a somewhat large predefined set of documents and query strings,
> and I am finding that collecting the results by indexing the documents in
> separate indexes with each of the Similarity classes, searching using the
> query strings, locating the subset of documents in the results that I am
> interested in and recording the scores is taking quite a long time.
>
> I am about to look through the Lucene source code to see how the Similarity
> classes are used in normal use cases such as search and more-like-this, but
> if someone could direct me on where to look, or, even better, knows of an
> api function that takes a pair of documents, or a query and a document, and
> returns a similarity score for them, I would greatly appreciate it.
> Thanks,
> Mike
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message