lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Trieschnigg, R.B. \(Dolf\)" <r.b.trieschn...@ewi.utwente.nl>
Subject RE: Different scoring mechanism
Date Fri, 09 Jun 2006 11:48:05 GMT
> :    ! If a document does not contain a queryterm this score 
> can be larger
> : or smaller than 0 !
> 
> if a document doesn't contain a term, then the scorer for 
> that query will never even try to score that document -- 
> regardless of what your Similarity class looks like.
> 
> if you really want this kind of behavior, you'll need to roll 
> your own TermQuery/TermScorer classes and change next and 
> skipTo to allways advance ot the next doc -- regardless of 
> wether or not it matches (you can check for that in the score 
> function and act accordingly)

That sounds like a reasonable approach. However, I still require the searching process to
be optimized for retrieving the first n hits. (I made my own implementation outside the Lucene
search-architecture which was unbelievably slow).

For example: a query containing two terms: "fast", "car", having document frequencies 300.000
and 20.000 in the index respectively. In a worst case scenario this would require 320.000
document scores to be calculated. I am not really sure how lucene optimizes its search, but
I guess it does that by first processing the documents having the highest term frequencies
(and thus highest combined score) with these query terms, and pruning the search if the n
hits have been found and it's certain that no document can be found which will give a higher
score.

If I would change the next function in my own scorer to process all document ids, I am afraid
I will wreck Lucene's optimization method (as I am then not serving the documents in descending
term frequency order).

Perhaps someone can tell if Lucene indeed requires <scorer>.next() to return the documents
for a term in descending term frequency order?

Regards,
Dolf

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message