lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nader S. Henein" <...@bayt.net>
Subject RE: Count for a keyword occurance in a file
Date Thu, 29 Apr 2004 07:01:03 GMT
So even an educated calculation won't do it because you'd need to know how
many documents the word occurs in (you could do a search, but that would be
overkill and impractical).

Cool

-----Original Message-----
From: Ype Kingma [mailto:ykingma@xs4all.nl] 
Sent: Thursday, April 29, 2004 10:57 AM
To: Lucene Users List
Subject: Re: Count for a keyword occurance in a file


On Thursday 29 April 2004 08:14, Nader S. Henein wrote:
> Tricky, scoring has to do with the frequency of the occurrence of the 
> word as opposed to the amount of words in the file in general 
> (Somebody correct me if I'm wrong) , so short of an educated 
> approximation, you could hack

Lucene uses two frequencies for a term: the nr. of docs in which it occurs
in an index (basis for IDF), and the nr of times a term occurs in a
document.

> the indexer to dynamically store the frequency of a word (oh so 
> unadvisable). Personally I recommend the educated approximation, 
> because you could index the document with the number of words in it ( 
> you would have to make sure you're not using Stop Word Analyzer or 
> Port Stemmer) and then based on the score reverse engineer the result 
> you want.
>
> Nader Henein
>
> -----Original Message-----
> From: hemal bhatt [mailto:bhatthemal@rediffmail.com]
> Sent: Wednesday, April 28, 2004 5:50 PM
> To: Lucene Users List
> Subject: Count for a keyword occurance in a file
>
>
> Hi,
>
> How can I get a count of the score given by Hits.Score().
> i.e I want to know how many times a keyword occurs in a file. Any help 
> on this would be appreciated.

The easiest way is to use IndexReader. I don't know what you mean by file
(index or document), but you can have both frequencies I mentioned above
from an IndexReader, evt. using skipTo() to go to the document. The methods
are docFreq(Term) and termDocs(Term).

Regards,
Ype



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message