lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chris.b" <>
Subject Problem with termdocs.freq and other
Date Mon, 10 Dec 2007 10:58:15 GMT

Here goes,
I'm developing an application using lucene which will evaluate the
representativeness of a list of keywords within a collection of documents.
I'm doing this by indexing the documents and then, loading the list of
keywords and using the IndexReader Class and DefaultSimilarity, retrieving
and average tf of each word (where the tf is obtained through
TermDocs.freq() and the average is the sum of tf's divided by the number of
documents) and the idf for each word, and printing the output in an html
document, together with the documents in which they appear, and others.

At this point, I have found two problems,
I have documents, in which I know the word appears, but still the tf comes
out as '0' (even though the number of documents says 2).
and it doesn't print a list of all the documents (ie: it says there are 2
documents which contain the word, but only one of them is printed).

I don't know if what i'm doing is correct, but to obtain the tf, i'm doing
the following:

			while ( {
				tf += termDocs.freq();;

where termDocs is an enumeration of the documents which contain the word.

and for the document names I'm doing the following:

			for (int f = 0; f < listaDocNums.size(); f++) {

where listaDocNums is an arraylist which contains the numbers for the
I must also mention that when i try printing the list of numbers, it also
doesn't contain all the documents.

That's it, i think i wrote all that was needed.

Thanks in advance for any help/guidelines :)

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message