lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Trieschnigg, R.B. \(Dolf\)" <>
Subject RE: Different scoring mechanism
Date Fri, 09 Jun 2006 11:48:05 GMT
> :    ! If a document does not contain a queryterm this score 
> can be larger
> : or smaller than 0 !
> if a document doesn't contain a term, then the scorer for 
> that query will never even try to score that document -- 
> regardless of what your Similarity class looks like.
> if you really want this kind of behavior, you'll need to roll 
> your own TermQuery/TermScorer classes and change next and 
> skipTo to allways advance ot the next doc -- regardless of 
> wether or not it matches (you can check for that in the score 
> function and act accordingly)

That sounds like a reasonable approach. However, I still require the searching process to
be optimized for retrieving the first n hits. (I made my own implementation outside the Lucene
search-architecture which was unbelievably slow).

For example: a query containing two terms: "fast", "car", having document frequencies 300.000
and 20.000 in the index respectively. In a worst case scenario this would require 320.000
document scores to be calculated. I am not really sure how lucene optimizes its search, but
I guess it does that by first processing the documents having the highest term frequencies
(and thus highest combined score) with these query terms, and pruning the search if the n
hits have been found and it's certain that no document can be found which will give a higher

If I would change the next function in my own scorer to process all document ids, I am afraid
I will wreck Lucene's optimization method (as I am then not serving the documents in descending
term frequency order).

Perhaps someone can tell if Lucene indeed requires <scorer>.next() to return the documents
for a term in descending term frequency order?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message