lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gerard Sychay" <>
Subject Re: Count for a keyword occurance in a file
Date Fri, 30 Apr 2004 15:11:14 GMT
I had the same need recently.  Specifically, I wanted the ability to
display along with the results something like:

- The query "jra" occurred 1000 times in 600 documents.

For simple queries, the IndexReader.docFreq(Term) and
IndexReader.termDocs(Term) methods are the way to go.  But for like

- The query "juvenile arthritis" occurred 100 times in 20 documents.

and wildcard queries ("rheum*"):

- The query "rheumatology" occurred 10 times in 5 documents.
- The query "rheumatoid" occurred 10 times in 5 documents.
- The query "rheumatic" occurred 10 times in 5 documents.

I had to do quite a bit more.  I ended up modifying all of the Query
classes and writing a Frequencies class. If y ou're interested, mail me

BTW, I joined the list only recently.  Lucene is GREAT!

>>> Ype Kingma <> 04/29/04 02:56AM >>>
On Thursday 29 April 2004 08:14, Nader S. Henein wrote:
> Tricky, scoring has to do with the frequency of the occurrence of the
> as opposed to the amount of words in the file in general (Somebody
> me if I'm wrong) , so short of an educated approximation, you could

Lucene uses two frequencies for a term: the nr. of docs in which it
in an index (basis for IDF), and the nr of times a term occurs in a

> the indexer to dynamically store the frequency of a word (oh so
> unadvisable). Personally I recommend the educated approximation,
> you could index the document with the number of words in it ( you
> have to make sure you're not using Stop Word Analyzer or Port
Stemmer) and
> then based on the score reverse engineer the result you want.
> Nader Henein
> -----Original Message-----
> From: hemal bhatt [] 
> Sent: Wednesday, April 28, 2004 5:50 PM
> To: Lucene Users List
> Subject: Count for a keyword occurance in a file
> Hi,
> How can I get a count of the score given by Hits.Score().
> i.e I want to know how many times a keyword occurs in a file. Any
help on
> this would be appreciated.

The easiest way is to use IndexReader. I don't know what you mean by
(index or document), but you can have both frequencies I mentioned
from an IndexReader, evt. using skipTo() to go to the document.
The methods are docFreq(Term) and termDocs(Term).


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message