lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Francois Beaulac <>
Subject RE: ConstantScoreQuery and MatchAllDocsQuery
Date Tue, 27 Feb 2007 22:35:01 GMT

The existing code retrieved a TermPositionVector with IndexReader.getTermFreqVector(docId,
field). It then extracted the terms for
the query and stores them in two different array.

One containing single word terms, the other containing the phrases.

For single word term it loops on the array of term and increment the frequency this way:

freq += tpv.getTermFrequencies()[tpv.indexOf(currentTerm.text())];

For the phrase it works the same way, but of course it searches for the entire set of terms
in the correct order.

Fast enough means that for the search query: on going for, if I have 3000 results which consists
of document with an average of 1000
words it must be able to do it under 50ms on a dual Xeon machine. With the TermPositionVector
my best results with no load on the
server were around 3000ms.

I am still an amateur with lucene, I have to migrate an application which used a customized
version of lucene 1.3 to 2.1. I would
really like to be able to use an unmodified version of lucene since it would be a lot easier
to keep up to date with lucene.

I'll give a try with TermDocs.


-----Message d'origine-----
De : Chris Hostetter [] 
Envoyé : February 23, 2007 7:18 PM
À : Lucene Users
Objet : Re: ConstantScoreQuery and MatchAllDocsQuery

: I ask this because I need to return the frequency of the search terms
: with each of my results, I tried using the TermFreqVector object but
: unfortunately it was not fast enough, so I decided to modifiy lucene to
: be able to return the frequency the same way the score is returned by
: I started by adding public abstract int freq(); in package
: class, and then modified
: everyimplementation of Scorer to be able to get the frequency.

can you elaborate on:
 * how you were trying to use TermFreqVector
 * how you define "fast enough"
 * how you are now getting the freq() value in all of the Scorer classes?

If all you need to know is the frequency of each term in your query (and
not hte frequency of all terms in teh document) did you try using the
freq() method in the TermDocs iterator instead of the TermFreqVector

using Query.extractTerms, and then getting a TermDocs instance
and iterating over those terms using seek and over the docids from your
results using skipTo should be an extremely fast way to get the freq()

: It works well and fast, the only problem I have is that I did not find a
: way to compute the frequency in both and
: internal scorers.

neither of those queries involve any terms, so i'm not sure what freq()
would even make sense ... "1" or "0" i would imagine.


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message