lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: search quality - assessment & improvements
Date Mon, 25 Jun 2007 18:56:39 GMT

On Jun 25, 2007, at 2:04 PM, Doug Cutting wrote:

> Doron Cohen wrote:
>> It is very important that we would be able to assess the search  
>> quality in
>> a repeatable manner - so that anyone can repeat the quality tests,  
>> and
>> maybe find ways to improve them. (This would also allow to verify the
>> "improvements claims" above...). This capability seems like a  
>> natural part
>> of the benchmark package. I started to look at extending the  
>> benchmark
>> package with search quality module, that would open an index (or  
>> first
>> create one), run a set of queries (similar to the performance  
>> benchmark),
>> and compute and report the set of known statistics mentioned above  
>> and
>> more. Such a module depends on input data - documents, queries, and
>> judgements. And that's my second question. We don't have to invent  
>> this
>> data - TREC has it already, and it is getting wider every year as  
>> there are
>> more judgements. So, theoretically we could use TREC data.
> We should be careful not to tune things too much for any one  
> application and/or dataset.  Tools to perform evaluation would  
> clearly be valuable.  But changes that improve Lucene's results on  
> TREC data may or may not be of general utility.  The best way to  
> tune an application is to sample its query stream and evaluate  
> these against its documents.

+1.  To do this, we could use Reuters or Wikipedia.  The hard part is  
generating the queries and having people make relevance judgments for  
a sufficient sample size.  Over time it would get better, especially  
if we had a nice way for people to add queries/judgments w/o going  
through the patch/commit process (maybe a page on the wiki could hold  
the queries and judgments?  That could get tricky) we might get more  
support from outsiders.

> That said, Lucene's scoring method has never been systematically  
> tuned, and some judicious tuning based on TREC results would  
> probably benefit a majority of Lucene applications.  Ideally we can  
> develop evaluation tools, use them on a variety of datasets to find  
> better defaults for Lucene, and make the tools available so that  
> folks can fine-tune things for their particular applications.

+1 as well.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message