lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <>
Subject Re: Query time document group boosting
Date Thu, 27 Nov 2008 19:55:46 GMT

27 nov 2008 kl. 10.15 skrev Toke Eskildsen:

> On Thu, 2008-11-27 at 07:30 +0100, Karl Wettin wrote:
>> The most scary part is that that you will have to score each and  
>> every
>> document that has a source, probably all of the documents in your
>> corpus.
> I now see my query-logic was flawed. In order to avoid matching all
> documents every time, the query would have to be
> "foo AND (
>  groupboost_A:dummy^10 OR
>  groupboost_B:dummy OR
>  groupboost_C:dummy^0.1 OR
>  ...
>  groupboost_Z:dummy
> )"
> With that query, it seems that only documents matching foo will result
> in a hit and be scored?

Someone else than me needs to answer this. I know there is no  
optimization of boolean clauses, that is why I'm saying this: it is  
possible that the boolean query weight actually will be visiting all  
the inner clauses even though "foo" was not matched, i.e. all  
documents in the index are visited but might not all be scored.

A cosmetic remark, I would personally choose a single field for the  
boosts and then one token per source. (groupboost:A^10 groupboost:B^1  

>> I think you are looking for CustomScoreQuery.
> Possibly, but my understanding is too weak to see how I can avoid a
> substantial performance-hit for the check for source?

If I'm not misstaken CustomScoreQuery is a non matching query, that it  
only touch the score of something that is already matched by a  
subquery. If these statments are true then it feels like there are  
clock ticks to save here.

But maybe all of these things are a bit too preemptive. Just use what  
you have if it seems to work well enough.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message