lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AHMET ARSLAN <iori...@yahoo.com>
Subject Re: Term's frequency
Date Fri, 31 Jul 2009 08:00:57 GMT


> Given a term say "apache", I want to look up the lucene index
> programmatically to find out its frequency in the corpus.

I think you are asking collection frequency of a term. Term Frequency is defined between a
document and a term which is printed in the loop in the following code. And at the end there
is collection freq. which sum of tfs.

String path = "E:\\ThesaurusSolrHome\\data\\index";
        String field = "contents";
        Term term = new Term(field, "apache");

        IndexReader indexReader = IndexReader.open(path);

        TermDocs termDocs = indexReader.termDocs(term);
        int collectionFreq = 0;
        while (termDocs.next()) {
            System.out.print("Document " + termDocs.doc() + " contains the term " + term.text()
+ " ");
            System.out.println(termDocs.freq() + " times");
            collectionFreq += termDocs.freq();
        }
        indexReader.close();
        System.out.println("Collection frequency of " + term.text() + " = " + collectionFreq);






      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message