lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Roberts <m...@andy-roberts.net>
Subject Re: getting the number of occurrences within a document
Date Thu, 14 Apr 2005 16:35:51 GMT
On Thursday 14 Apr 2005 15:15, Pablo Gomes Ludermir wrote:
> Hello all,
>
> I would like to get the following information from the index:
>
> 1. Given a term, how many times the term occurs in each document.
> Something like a triple:
> < Term, Doc1, Freq> , <Term, Doc2, Freq>, <Term2, Docx, Freq>, ...
>
> Is possible to do that?
>
>
> Regards,
> Pablo

Off the top of my head... assuming you have an IndexReader (or MultiReader) called reader:

TermEnum te = reader.terms();

while (te.next()) {
	Term currentTerm = te.term();
	
	TermDocs docs = reader.termDocs(currentTerm);
	int docCounter = 1;
	while (docs.next()) {
		System.out.println(currentTerm.text() + ", doc" + docCount + ", " + docs.freq());
		docCounter++;
	}
}

HTH,

Andy

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message