lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <DOR...@il.ibm.com>
Subject Re: Problem with termdocs.freq and other
Date Mon, 10 Dec 2007 11:02:24 GMT
>          while (termDocs.next()) {
>             termDocs.next();
>          }

For one, this loop calls next() twice in each iteration,
so every second is skipped... ?

"chris.b" <omelhornomedomundo@gmail.com> wrote on 10/12/2007 12:58:15:

>
> Here goes,
> I'm developing an application using lucene which will evaluate the
> representativeness of a list of keywords within a collection ofdocuments.
> I'm doing this by indexing the documents and then, loading the list of
> keywords and using the IndexReader Class and DefaultSimilarity,retrieving
> and average tf of each word (where the tf is obtained through
> TermDocs.freq() and the average is the sum of tf's divided by
> the number of
> documents) and the idf for each word, and printing the output in an html
> document, together with the documents in which they appear, and others.
>
> At this point, I have found two problems,
> I have documents, in which I know the word appears, but still
> the tf comes
> out as '0' (even though the number of documents says 2).
> and it doesn't print a list of all the documents (ie: it says there are 2
> documents which contain the word, but only one of them is printed).
>
> I don't know if what i'm doing is correct, but to obtain the
> tf, i'm doing
> the following:
>
>          while (termDocs.next()) {
>             listaDocNums.add(termDocs.doc());
>             tf += termDocs.freq();
>             termDocs.next();
>          }
>
> where termDocs is an enumeration of the documents which containthe word.
>
> and for the document names I'm doing the following:
>
>          for (int f = 0; f < listaDocNums.size(); f++) {
>             outrstream.write(reader.document(listaDocNums.
> get(f)).get("filename"));
>          }
>
> where listaDocNums is an arraylist which contains the numbers for the
> documents.
> I must also mention that when i try printing the list of numbers, it also
> doesn't contain all the documents.
>
> That's it, i think i wrote all that was needed.
>
> Thanks in advance for any help/guidelines :)
>
> Chris
> --
> View this message in context: http://www.nabble.com/Problem-
> with-termdocs.freq-and-other-tp14250898p14250898.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message