lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Relevancy Practices
Date Wed, 05 May 2010 14:10:08 GMT
Thanks, Peter.

Can you share what kind of evaluations you did to determine that the end user believed the
results were equally relevant?  How formal was that process?


On May 3, 2010, at 11:08 AM, Peter Keegan wrote:

> We discovered very soon after going to production that Lucene's scores were
> often 'too precise'. For example, a page of 25 results may have several
> different score values, and all within 15% of each other, but to the end
> user all 25 results were equally relevant. Thus we wanted the secondary sort
> field to determine the order, instead. This required writing a custom score
> comparator to 'round' the scores. The same thing occurred for distance
> sorting. We also limit the effect of term frequency to help prevent
> spamming.  In comparison to Avi, we use 'AND' as the default operator for
> keyword queries and if no docs are found, the query is automatically retried
> with 'OR'. This improves precision a bit and only occurs if the user
> provides no operators.
> Lucene's Explanation class has been invaluable in helping me to explain a
> particular sort order in many, many situations.
> Most of our relevance tuning has occurred after deployment to production.
> Peter
> On Thu, Apr 29, 2010 at 10:14 AM, Grant Ingersoll <>wrote:
>> I'm putting on a talk at Lucene Eurocon (
>> on "Practical
>> Relevance" and I'm curious as to what people put in practice for testing and
>> improving relevance.  I have my own inclinations, but I don't want to muddy
>> the water just yet.  So, if you have a few moments, I'd love to hear
>> responses to the following questions.
>> What worked?
>> What didn't work?
>> What didn't you understand about it?
>> What tools did you use?
>> What tools did you wish you had either for debugging relevance or "fixing"
>> it?
>> How much time did you spend on it?
>> How did you avoid over/under tuning?
>> What stage of development/testing/production did you decide to do relevance
>> tuning?  Was that timing planned or not?
>> Thanks,
>> Grant

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message