> For multiword queries, I would like to reward documents that contain a more
> even distribution of each word and penalize documents that have a skewed
> distribution. For example, if my search query is:
>
> +content:fast +content:car
>
> I would prefer a document that contains each word an equal number of times
> over a document that contains the word "fast" 100 times and the word "car" 1
> time. In other words, I would like to compare the scores of each
> BooleanQuery term and adjust the score according to the distribution.
>
> Can somebody point me in the right direction as to how I would implement
> this?
It's already there in DefaultSimilarity.tf() which is the square root:
(sqrt(1) + sqrt(1)) > (sqrt(0) + sqrt(2))
