lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Proposal: extracting term-level stats from query process
Date Thu, 11 Mar 2004 11:37:55 GMT
I think the TermScorer could be used to produce some useful feedback on performance of terms
used in queries with the addition of some new methods:
int getNumDocMatches();
float getAverageScore();

These could be used in the following scenarios:
* selecting which terms to offer spelling correction on (when numDocMatches==0)
* influencing the highlighter selections (doc fragments scored based on contained term weights)
* For "more like this" natural language type queries the highlighter could highlight only
"significantly" scored terms and
ignore low-scoring noise words.

The stats accumulation code that would need adding to term scorer would add negligible overhead
but the main issue would be how to 
expose  the TermScorer object to users.
I had initially planned to do all of this with a new class that required no Lucene changes.
That would have looked like this:

//wrap normal query in a new query
ProfilerQuery pq=new ProfilerQuery(anyLuceneQuery);
//run query as normal
//analyze results
ProfiledTermStats[] ts=pq.getTermStats()
for(int i=0;i<ts.length;i++)
  System.out.println(ts[i].getTerm()+" in "+ts[i].getNumMatches+
     " docs, ave score="+ts[i].getAverageScore() );

I quickly discovered this wasnt possible with requiring a change to the existing lucene code.

Anyone else find this a worthwhile change? I know it would be possible to derive all this
information using existing 
APIs but it would effectively involve another pass of the same index data.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message