lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: search quality - assessment & improvements
Date Mon, 25 Jun 2007 18:04:41 GMT
Doron Cohen wrote:
> It is very important that we would be able to assess the search quality in
> a repeatable manner - so that anyone can repeat the quality tests, and
> maybe find ways to improve them. (This would also allow to verify the
> "improvements claims" above...). This capability seems like a natural part
> of the benchmark package. I started to look at extending the benchmark
> package with search quality module, that would open an index (or first
> create one), run a set of queries (similar to the performance benchmark),
> and compute and report the set of known statistics mentioned above and
> more. Such a module depends on input data - documents, queries, and
> judgements. And that's my second question. We don't have to invent this
> data - TREC has it already, and it is getting wider every year as there are
> more judgements. So, theoretically we could use TREC data.

We should be careful not to tune things too much for any one application 
and/or dataset.  Tools to perform evaluation would clearly be valuable. 
  But changes that improve Lucene's results on TREC data may or may not 
be of general utility.  The best way to tune an application is to sample 
its query stream and evaluate these against its documents.

That said, Lucene's scoring method has never been systematically tuned, 
and some judicious tuning based on TREC results would probably benefit a 
majority of Lucene applications.  Ideally we can develop evaluation 
tools, use them on a variety of datasets to find better defaults for 
Lucene, and make the tools available so that folks can fine-tune things 
for their particular applications.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message