lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sahin Buyrukbilen <sahin.buyrukbi...@gmail.com>
Subject Re: How to get the score of a term in a document?
Date Sat, 02 Oct 2010 18:48:57 GMT
Hi Mike and all other friends,

the code below does almost what I want. The only thing I need to do is to
write the <docID, score> pairs in the order with respect to score, not
docID. Current implementation does it wrt docID. I will try to solve that
problem, if anybody knows how to do that I appreciate if he/she shares with
me.

I am adding my code just in case anybody needs it.

Regards.

Sahin.

public class YilmazTest {


    public static void main(String[] args) {

        try{
            BufferedWriter out = new BufferedWriter(new
FileWriter("/home/guardian/Lucene/output")); // output file
            Directory dir = FSDirectory.open(new
File("/home/guardian/Lucene/Indexes"));

            IndexReader reader = IndexReader.open(dir);

            TermEnum termEnum = reader.terms();

            IndexSearcher searcher;
            searcher = new IndexSearcher(dir, true);

            System.out.println(reader.numDocs());


            while(termEnum.next()){
                TermDocs termDocs = reader.termDocs(termEnum.term());

                TermQuery tq = new TermQuery(new
Term(termEnum.term().field(), termEnum.term().text()));

                Scorer s = tq.weight(searcher).scorer(reader, true, false);

                boolean once = true;
                while(termDocs.next()){
                    if (once){
                        out.write(termEnum.term().text() + "      ");
                        out.write(termEnum.docFreq() + "      ");
                    }


                    s.nextDoc();

                    out.write("<" + termDocs.doc() + "," + s.score() + ">
");
                    once = false;
                }
                out.newLine();
            }
            out.close();
        }
        catch(IOException ex){}
    }
}


On Sat, Oct 2, 2010 at 9:42 AM, Sahin Buyrukbilen <
sahin.buyrukbilen@gmail.com> wrote:

> Hi Mike,
>
> I am already done with walking through the terms, frequencies and the docs
> by using termenum, termdocs, and indexreader,. The only thing left is the
> scores. I will try your suggestion. hope it works.
>
> Thank you.
>
> Sahin.
>
> On Sat, Oct 2, 2010 at 5:30 AM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> It sounds like you can just use Lucene's enum APIs (IndexReader.terms,
>> IndexReader.termDocs) to walk the entire index, converting it to your
>> format?
>>
>> I'm not sure how Luke computes the "score"... but maybe you could, for
>> every term, make a TermQuery and then directly walk its matching docs
>> & scores?  You'd have to do something like:
>>
>>  Scorer s = TermQuery.weight(searcher).scorer(reader, true, false);
>>
>>  int docID;
>>  while((docID = s.nextDoc()) != Scorer.NO_MORE_DOCS) {
>>    float score = s.score();
>>  }
>>
>> I think?
>>
>> Mike
>>
>> On Fri, Oct 1, 2010 at 11:49 PM, Sahin Buyrukbilen
>>  <sahin.buyrukbilen@gmail.com> wrote:
>> > Hi Erick,
>> >
>> > I mean the score of a term in a document (we can think this as a one
>> word
>> > query) which is calculated by using "Default Similarity". Actually, when
>> I
>> > walk through my index term-by-term, Luke shows me the number of
>> documents in
>> > which the term exists. And for each document there is a score field.
>> please
>> > check the attachment for the screenshot. I am very new to the jargon of
>> > Lucene, so I am sorry if I explain things in an incorrect way.
>> >
>> > My question is: For a term in the index, can we retrieve the value (here
>> I
>> > say score) calculated by using default similarity? Is this a value which
>> is
>> > already stored in the index or is it calculated on the fly by Luke
>> (since I
>> > can only see by using Luke)?
>> >
>> > My goal is to create an inverted index and write it into a text file in
>> the
>> > following form:
>> >
>> > Term t        ft         Inverted list for t
>> >
>> ----------------------------------------------------------------------------------
>> > big              2        <2, 0.148> <3, 0.088>
>> > in                5        <6, 0.159> <2, 0.143> <5, 0.088>
<1, 0.076>
>> <4,
>> > 0.065>
>> > -
>> > -
>> > -
>> > -
>> > -
>> > so on for all terms. Here ft is the total frequency of term t in the
>> whole
>> > index, <docID , score > pairs are ID of the document in which term t has
>> a
>> > score, and these pairs are listed according to the decreasing order of
>> > scores.
>> >
>> >
>> > I checked through the documentation, and found scorer class but couldnt
>> > understand how to use it.
>> >
>> > I hope this is a kind of better explanation.
>> >
>> > Best.
>> > Sahin.
>> >
>> >
>> > On Fri, Oct 1, 2010 at 9:22 PM, Erick Erickson <erickerickson@gmail.com
>> >
>> > wrote:
>> >>
>> >> I'm not sure what you're asking for. "Score of a term in a document"?
>> Do
>> >> you
>> >> mean the amount a term contributed to a search for a particular
>> document?
>> >> The frequency of a term in a document? ???
>> >>
>> >> Could you elaborate on what you're trying to do? If you describe the
>> >> problem
>> >> you're trying to solve, people can provide better answers.
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Fri, Oct 1, 2010 at 11:33 AM, Sahin Buyrukbilen <
>> >> sahin.buyrukbilen@gmail.com> wrote:
>> >>
>> >> > Hi all,
>> >> >
>> >> > I need to retrieve the score of a term in a document? I dont want to
>> >> > play
>> >> > different scoring schemes. I just checked my index with Luke and it
>> >> > shows
>> >> > me
>> >> > a score for each term in each document the term exists. So, I need
>> just
>> >> > to
>> >> > get that score.
>> >> >
>> >> > Can anybody help me?
>> >> >
>> >> > Thank you in advance.
>> >> >
>> >> > Sahin.
>> >> >
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message