lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen>
Subject Re: Query time document group boosting
Date Wed, 03 Dec 2008 10:25:42 GMT
On Tue, 2008-12-02 at 23:42 +0100, Chris Hostetter wrote:
> : A cosmetic remark, I would personally choose a single field for the boosts and
> : then one token per source. (groupboost:A^10 groupboost:B^1 groupboost:C^0.1).
> that's a key improvement, as it helps keep the number of unique fields 
> down, even if the number of sources grows without bounds.  make sure you 
> omitNorms on your groupboost field, and when buiding your various boolean 
> queries, consider disabling the coord (check the docs to understand why 
> that might make sense)

Ah yes, with norms, the score would be heavily influenced by the size of the group.
As for the number of groups, the downside to this method is clearly that all 
documents must have group status defined for each distinct set of groups.


Maybe this could be solved by a shared "dummy"-group that is defined for all 
documents? If we have two collections of groups: 
 * Source with the groups A-Z
 * Availability with the groups Available and Unavailable

Document 0
 * groupfield:dummy
 * groupfield:A
 * groupfield:Available
Document 1
 * groupfield:dummy
 * groupfield:Unavailable
Document 2
 * groupfield:dummy
 * groupfield:B
Document 3
 * groupfield:dummy

If we want to boost documents from Source A, we make the query
foo AND (groupfield:dummy OR groupfield:A^10)

Similary, if we want to demote documents with Availability Unavailable,
the query would be
foo AND (groupfield:dummy OR groupfield:Unavailable^0.1)


It would probably be cleaner to use a MatchAllDocsQuery instead of the dummy,
but I'm a bit unsure about the combined scoring. I'll have to experiment
further with this.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message