lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Term collection frequency
Date Mon, 22 Jun 2009 09:09:07 GMT
There's IndexReader.docFreq(Term), which returns the number of
documents that the term occurred in (excluding un-merged deletions).

But the global count of how many times a Term occurred across all docs
is not stored.

You'd have to get a TermDocs enum for that Term, iterate through all
docs, and sum up the freq() from each doc, to compute that, I believe.

Mike

On Mon, Jun 22, 2009 at 4:55 AM, Murat
Yakici<Murat.Yakici@cis.strath.ac.uk> wrote:
> Hi,
>
> As far as I know, there is no public API to get a term's collection
> frequency in Lucene, apart from writing routines with TFV or TermEnum.
> Does Lucene store the number of times a term occur in the index? If yes,
> can someone direct me to the low-level api where I can get such
> information through some extension? If that is not possible, this would
> require a change in the index format I imagine? Which classes I should be
> dealing with and things I should be careful in implementing such a change?
>
>
> Cheers,
> Murat Yakici
> Department of Computer & Information Sciences
> University of Strathclyde
> Glasgow, UK
> -------------------------------------------
> The University of Strathclyde is a charitable body, registered in Scotland,
> with registration number SC015263.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message