lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Scoring for all the documents in the index relative to a query
Date Mon, 19 Nov 2007 18:38:32 GMT
Lucene only scores those documents that have at least one match term,  
it doesn't implement a pure vector space model whereby all documents  
are scored (it uses a combination of the Boolean Model and VSM).   
Thus, I am not sure you can do a pure comparison.  I suppose you could  
simulating the relevance by using TermVectors and looping over all  
documents, but I think one could argue this isn't exactly what Lucene  
does, so it isn't comparable.

http://lucene.apache.org/java/docs/scoring.html might help in  
understanding this stuff.

HTH,
Grant

On Nov 19, 2007, at 1:25 PM, HAIDUC SONIA wrote:

> I am trying to order all the documents in the index according to  
> their similarity to a given query. I am interested in having a  
> complete list of *all* the documents in the index with their score.  
> From what I understood by reading some documentation, Lucene  
> internally assigns scores to all the documents in the index  
> according to their similarity to the query, but when returning the  
> hits, all the scores that are less than 0 are rounded to 0 and only  
> the documents with the score > 0 are returned as hits. But what I  
> would like to get is the list before this intermediate processing,  
> so the list of all the documents with their raw score. I am trying  
> to compare Lucene with LSI and for the comparison I want to do, I  
> need the entire list of documents. Is there a way that I can get  
> that with Lucene?
> I hope I explained it clearly this time. If you need more details  
> let me know.
>
> Thank you,
> Sonia
>
> ----- Original Message ----
> From: Erick Erickson <erickerickson@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Monday, November 19, 2007 11:55:00 AM
> Subject: Re: Scoring for all the documents in the index relative to  
> a query
>
>
> Could you explain a bit more what problem you're trying to solve?
> The reason I ask is that your question doesn't make sense to me,
> since I have no idea what you expect by the term "negative score".
>
> My simplistic view has been that all the docs returned via Hits
> or HitCollector have scores > 0, and all the rest have scores of 0,
> and this view is supported by the explanation of
> HitCollector.collect
>
> " Called once for every non-zero scoring document, with the
> document number and its score."
>
> You might also get value from this page:
> http://lucene.apache.org/java/docs/scoring.html#Scoring
>
> Best
> Erick
>
> On Nov 19, 2007 11:05 AM, HAIDUC SONIA <haiduc_sonia@yahoo.com> wrote:
>
>> Hi everyone,
>>
>> I am trying to obtain the score for each document in the index
> relative to
>> a given query. For example, if I have the query "search file", I am
> trying
>> to get the list of all documents in the index and their scores
> relative to
>> the given query. I tried first using Hits, which gave me the
> normalized
>> score. I thought that I don't see the whole list of documents and
> their
>> scores because of the normalization, so I tried using HitsCollector.
> But
>> even after using HitsCollector, I get the same number of matching
> documents,
>> so the normalization didn't exclude documents because of negative
> scoring.
>> Does Lucene actually compute the score for all the documents in the
> index or
>> just for matching documents? I really need to have the scores for all
> the
>> documents in the index relative to the query (even if negative), not
> just
>> the ones that contain the query terms(this is what Lucene considers
>> "matching documents", right?). Is this possible using Lucene?
>>
>> I really appreciate your time and effort!
>> Thanks,
>> Sonia
>>
>>
>>
>>
>>
>>
>   
> ____________________________________________________________________________________
>> Get easy, one-click access to your favorites.
>> Make Yahoo! your homepage.
>> http://www.yahoo.com/r/hs
>>
>
>
>
>
>
>
>       
> ____________________________________________________________________________________
> Get easy, one-click access to your favorites.
> Make Yahoo! your homepage.
> http://www.yahoo.com/r/hs

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message