lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Morus Walter <>
Subject distinct queries for search and scoring
Date Tue, 17 Feb 2009 10:10:19 GMT

I'm currently thinking about what the best solution would be for the
following request:

- a lucene index should be queried for a number of search criteria
- the score for each result should not be the normal query score, but an
  indicator on the similarity between the matched document and some
  other conditions that can be expressed as a query as well.

The use case is something like a search for jobs (defined by arbitrary
user input) and a scoring based on similarity to a users profile
(basically his CV).

This can certainly be done in various ways
- get the scores from a score query; do the main search then and attach
  the scores to the results
- do the main search first and then the score query using the results
  of the main as a filter (the score query might need a small
  modification to match for all documents)
- combine the searches into one and make the scoring part for the
  main query neglectable
- see if it's possible to run two scorer at a time and combine the
  results; of course one scorer would have to score documents in an
  order defined by the other (that's just a vague idea; I didn't check
  the low level APIs thoroughly yet; so maybe this does not work at all)
but I don't have a clear idea what the performance expectations for the
different ways might be.

So before I start experimenting I'd like to ask if anyone on the list
has ever done something like this (or thought about it) or has other
insights that might be helpful.

The indices in question are medium size (50k/200k documents; but that
might increase up to a few millon). The main query might match a large
part of that index (up to all documents), as we do an incremental search
where each user input results in a search even if the complete search
criteria isn't provided yet. The number of documents having a score
larger than 0 (that is match the score query) is usually smaller but
might reach a few thousands.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message