Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Date: Thu, 11 Mar 2004 11:37:55 GMT
Message-Id: <200403111137.i2BBbtdA006874@server0027.freedom2surf.net>
From: markharw00d@yahoo.co.uk
To: lucene-dev@jakarta.apache.org
Subject: Proposal: extracting term-level stats from query process

I think the TermScorer could be used to produce some useful feedback on performance of terms used in queries with the addition of some new methods:
int getNumDocMatches();
float getAverageScore();

These could be used in the following scenarios:
* selecting which terms to offer spelling correction on (when numDocMatches==0)
* influencing the highlighter selections (doc fragments scored based on contained term weights)
* For "more like this" natural language type queries the highlighter could highlight only "significantly" scored terms and
ignore low-scoring noise words.

The stats accumulation code that would need adding to term scorer would add negligible overhead but the main issue would be how to 
expose  the TermScorer object to users.
I had initially planned to do all of this with a new class that required no Lucene changes. That would have looked like this:

//wrap normal query in a new query
ProfilerQuery pq=new ProfilerQuery(anyLuceneQuery);
//run query as normal
searcher.search(pq...)
//analyze results
ProfiledTermStats[] ts=pq.getTermStats()
for(int i=0;i<ts.length;i++)
{
  System.out.println(ts[i].getTerm()+" in "+ts[i].getNumMatches+
     " docs, ave score="+ts[i].getAverageScore() );
}

I quickly discovered this wasnt possible with requiring a change to the existing lucene code.

Anyone else find this a worthwhile change? I know it would be possible to derive all this information using existing 
APIs but it would effectively involve another pass of the same index data.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org