lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: How to get the score of a term in a document?
Date Sat, 02 Oct 2010 09:30:38 GMT
It sounds like you can just use Lucene's enum APIs (IndexReader.terms,
IndexReader.termDocs) to walk the entire index, converting it to your
format?

I'm not sure how Luke computes the "score"... but maybe you could, for
every term, make a TermQuery and then directly walk its matching docs
& scores?  You'd have to do something like:

  Scorer s = TermQuery.weight(searcher).scorer(reader, true, false);

  int docID;
  while((docID = s.nextDoc()) != Scorer.NO_MORE_DOCS) {
    float score = s.score();
  }

I think?

Mike

On Fri, Oct 1, 2010 at 11:49 PM, Sahin Buyrukbilen
<sahin.buyrukbilen@gmail.com> wrote:
> Hi Erick,
>
> I mean the score of a term in a document (we can think this as a one word
> query) which is calculated by using "Default Similarity". Actually, when I
> walk through my index term-by-term, Luke shows me the number of documents in
> which the term exists. And for each document there is a score field. please
> check the attachment for the screenshot. I am very new to the jargon of
> Lucene, so I am sorry if I explain things in an incorrect way.
>
> My question is: For a term in the index, can we retrieve the value (here I
> say score) calculated by using default similarity? Is this a value which is
> already stored in the index or is it calculated on the fly by Luke (since I
> can only see by using Luke)?
>
> My goal is to create an inverted index and write it into a text file in the
> following form:
>
> Term t        ft         Inverted list for t
> ----------------------------------------------------------------------------------
> big              2        <2, 0.148> <3, 0.088>
> in                5        <6, 0.159> <2, 0.143> <5,
0.088> <1, 0.076> <4,
> 0.065>
> -
> -
> -
> -
> -
> so on for all terms. Here ft is the total frequency of term t in the whole
> index, <docID , score > pairs are ID of the document in which term t has a
> score, and these pairs are listed according to the decreasing order of
> scores.
>
>
> I checked through the documentation, and found scorer class but couldnt
> understand how to use it.
>
> I hope this is a kind of better explanation.
>
> Best.
> Sahin.
>
>
> On Fri, Oct 1, 2010 at 9:22 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>>
>> I'm not sure what you're asking for. "Score of a term in a document"? Do
>> you
>> mean the amount a term contributed to a search for a particular document?
>> The frequency of a term in a document? ???
>>
>> Could you elaborate on what you're trying to do? If you describe the
>> problem
>> you're trying to solve, people can provide better answers.
>>
>> Best
>> Erick
>>
>> On Fri, Oct 1, 2010 at 11:33 AM, Sahin Buyrukbilen <
>> sahin.buyrukbilen@gmail.com> wrote:
>>
>> > Hi all,
>> >
>> > I need to retrieve the score of a term in a document? I dont want to
>> > play
>> > different scoring schemes. I just checked my index with Luke and it
>> > shows
>> > me
>> > a score for each term in each document the term exists. So, I need just
>> > to
>> > get that score.
>> >
>> > Can anybody help me?
>> >
>> > Thank you in advance.
>> >
>> > Sahin.
>> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message