lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Provalov <>
Subject Re: Relevancy Practices
Date Tue, 04 May 2010 00:41:53 GMT

We are currently working on a relevancy improvement project.  We took the IBM's paper from
2007 TREC and followed the approaches they described to improve Lucene's relevance.  It also
gave us some idea of Lucene’s out-of-the-box precision performance (MAP).  In addition to
it we used some of the best practices described in TREC's book (Voorhees 2005, MIT).  We also
looked into the probability scoring model (BM25). 

We started by comparing “vanilla” Lucene to our Lucene-based product’s performance.
 We obtained the collections and the judgments from the past TREC which were close to the
genre of the content we store.  We then proceeded to study how different tunings affected
the scores.  We used Lucene's benchmarking module to run against the TREC data.  Even though
there were a few old TREC document/topic format related issues along the way, this benchmarking
tool was all together great in helping find the MAP and measure where we were at.  

Then we applied the Sweet Spot similarity, Pivot Point document length normalization (Lnb/Ltc),
and BM25 scoring algorithms.  After applying these different scoring mechanism changes and
other techniques (different stemmers, query expansion), we saw some improvements.  We then
compared this to our current production system and started tuning it as well.  

Our second goal here was to include the relevance measurement into the continuous integration
tests running nightly.  The thought here is that if one of the system’s changes inadvertently
affected the scoring, we would find out right away.  This second phase also helped us discover
hidden bugs in our production system. 

In addition to the English-based analyzers, we also studied Chinese analyzers and compared
the results with the English collection runs.  We used TREC data for that.

Some observations:
1.	Even though the Vector Space model with Boolean query (OR) gives good MAP scores, in some
products the large number of returned results makes the product less usable.  So, defaulting
to AND operator may be a better option as was mentioned in this user group post earlier.
2.	This TREC-based evaluation is just of many tools to use.  For example, user feed-back is
still the most important evaluation one can do.
3.	We will continue studying how different scoring mechanisms affect relevance quality before
making a decision whether to switch from the default VSM.  Some of our concerns are over-tuning
and performance testing.
4.	Lucene user community has been very helpful.  Robert Muir, Joaquin Iglesias, and others
helped with applying the scoring algorithms and providing great suggestions. 
5.	Some of the tools we use constantly - Lucene’s query Explanation and Luke.


Ivan Provalov

--- On Thu, 4/29/10, Grant Ingersoll <> wrote:

> From: Grant Ingersoll <>
> Subject: Relevancy Practices
> To:
> Date: Thursday, April 29, 2010, 10:14 AM
> I'm putting on a talk at Lucene
> Eurocon (
> on "Practical Relevance" and I'm curious as to what people
> put in practice for testing and improving relevance.  I
> have my own inclinations, but I don't want to muddy the
> water just yet.  So, if you have a few moments, I'd
> love to hear responses to the following questions.
> What worked?  
> What didn't work?  
> What didn't you understand about it?  
> What tools did you use?  
> What tools did you wish you had either for debugging
> relevance or "fixing" it?
> How much time did you spend on it?
> How did you avoid over/under tuning?
> What stage of development/testing/production did you decide
> to do relevance tuning?  Was that timing planned or
> not?
> Thanks,
> Grant


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message