lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dipesh <dipshres...@gmail.com>
Subject Re: tfIdf weights
Date Mon, 02 Feb 2009 05:20:59 GMT
hi,
i used lucene-2.4.0 to get tf-idf. i'm not sure if the newer versions have
direct methods to get tf-idfs as well.
this is lengthy but might help.

   // Get Term Enum that contains all the terms in the index using
FilterIndexReader
    TermEnum e = freader.terms();

  // find total number of docs
  int noOfDocs = freader.numDocs();

// Get TermDocs Enum
  TermDocs td = freader.termDocs();

    // seeking through all the terms
    while(e.next()){

      // get the term
      Term t = e.term();
      // only search the contents field
      Term term = new Term("contents",t.text());
      // Move to the <document, frequency>  pairs for term from the TermDocs
Enum containing the term t
      td.seek(term);


      // loop through each document containing the term.
      while(td.next())  {
        double weight = td.freq()*Math.log(noOfDocs/e.docFreq());
      // do something with the weight
      //......

     }
}

regards,
Dipesh




On Mon, Feb 2, 2009 at 2:06 AM, Rehan Abdulaziz <abdulazizrehan@gmail.com>wrote:

> Hi,
> Is it possible to retrieve the tfidf weights (or relative weight calculated
> from any formula) instead of simple term frequencies through
> getTermFreqVector()? I am more interested in knowing the relative weight of
> each term in each field rather than just the frequency of terms.
>
> Thank you very much
>
> --
> Rehan Abdul Aziz
> Software Engineer
> Scrybe Inc. (www.iscrybe.com)
> Cell: +92-344-5514583
>



-- 
----------------------------------------
"Help Ever Hurt Never"- Baba

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message