lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Williams <ken.willi...@thomsonreuters.com>
Subject Re: Confidence scores at search time
Date Mon, 02 Mar 2009 19:47:30 GMT
Hi Grant,

It's true, I may have an X-Y problem here. =)

My basic need is to sacrifice recall to achieve greater precision.  Rather
than always presenting the user with the top N documents, I need to return
*only* the documents that seem relevant.  For some searches this may be 3
documents, for some it may be none.

My user interface in this case isn't the standard "type words in a box and
we'll show you the best docs" - I'm using Lucene as a tool in the background
to do some exploration about how I could augment a set of traditional
results with a few alternative results gleaned from a different path.

Not sure if this helps with the X-Y problem, but that's my task at hand.

Also, while perusing the threads you refer to below, I saw a reference to
the following link, which seems to have gone dead:

  https://issues.apache.org/bugzilla/show_bug.cgi?id=31841
  (in http://www.lucidimagination.com/search/document/1618ce933c8ebd6b )

Has the issue tracker moved somewhere else?

Finally, I seem unable to get Searcher.explain() to do much useful - my code
looks like:

        Searcher searcher = new IndexSearcher(reader);
        QueryParser parser = new QueryParser(LuceneIndex.CONTENT, analyzer);
        Query query = parser.parse(queryString);
        TopDocCollector collector = new TopDocCollector(n);
        searcher.search(query, collector);

        for ( ScoreDoc d : collector.topDocs().scoreDocs ) {
            String explanation = searcher.explain(query, d.doc).toString();
            Field id = searcher.doc( d.doc ).getField( LuceneIndex.ID );
            System.out.println(id + "\t" + d.score + "\t" + explanation);
        }

In the output, I get explanations like "0.88922405 = (MATCH) product of:"
with no details.  Perhaps I need to do something different in indexing?

Thanks,


 -Ken


On 2/26/09 10:36 AM, "Grant Ingersoll" <gsingers@apache.org> wrote:

> I don't know of anyone doing work on it in the Lucene community.   My
> understanding to date is that it is not really worth trying, but that
> may in fact be an outdated view.  I haven't stayed up on the
> literature on this subject, so background info on what you are
> interested in would be helpful.
> 
> Digging around in the archives a bit more, I come up with some more
> relevant emails: 
> http://www.lucidimagination.com/search/?q=comparing+scores+across+searches#/
> p:lucene,solr/s:email
> 
> What is the bigger problem that you are trying to solve?  That is, you
> imply that score comparison is the solution, but you haven't said the
> problem you are trying to solve.
> 
> Cheers,
> Grant
> 
> 
> On Feb 25, 2009, at 11:38 AM, Ken Williams wrote:
> 
>> Hi all,
>> 
>> I didn't get a response to this - not sure whether the question was
>> ill-posed, or too-frequently-asked, or just not interesting.  But if
>> anyone
>> could take a stab at it or let me know a different place to look,
>> I'd really
>> appreciate it.
>> 
>> Thanks,
>> 
>> -Ken
>> 
>> 
>> On 2/20/09 12:00 PM, "Ken Williams"
>> <ken.williams@thomsonreuters.com> wrote:
>> 
>>> Hi,
>>> 
>>> Has there been any work done on getting confidence scores at
>>> runtime, so
>>> that scores of documents can be compared across queries?  I found one
>>> reference in the mailing list to some work in 2003, but couldn't
>>> find any
>>> follow-up:
>>> 
>>>  http://osdir.com/ml/jakarta.lucene.user/2003-12/msg00093.html
>>> 
>>> Thanks.
>> 
>> -- 
>> Ken Williams
>> Research Scientist
>> The Thomson Reuters Corporation
>> Eagan, MN
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
> using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

-- 
Ken Williams
Research Scientist
The Thomson Reuters Corporation
Eagan, MN


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message