lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edgar Meij" <edgar.m...@gmail.com>
Subject RE: Repeat Second time: Extract important terms by programming??
Date Wed, 22 Mar 2006 22:45:31 GMT
That's relatively easy, but not out-of-the box... 

Something like:

 private TreeMap<Double, String> getTFIDF(String index, int DocumentID, String Field
){
      try{
     IndexReader ir = IndexReader.open(index); 
    TermFreqVector tv = ir.getTermFreqVector(DocumentID, Field);
    String[] Termstv=tv.getTerms();
    Double Score;
    TreeMap<Double, String> TfIdfs = new TreeMap<Double, String>();
    int docFreq, N;
    double[] TF = getTermFreqs(tv);
    for (int i =0 ; i < tv.size(); i++){
         docFreq = ir.docFreq(new Term(Field,Termstv[i]));
           N = ir.numDocs() / docFreq;
          Score= Double.valueOf(TF[i] *  ( Math.log(N)/Math.log(2)));
          TfIdfs.put(Score, Termstv[i]);      
    }
    return TfIdfs;

Searching the mailinglist might help as well; http://mail-archives.apache.org/mod_mbox/lucene-java-user/200506.mbox/%3CA955EA1F8FE31749AEC8C998082F6C7C41A7AE@hai01.hippo.local%3E
And see also: http://www.alias-i.com/lingpipe/demos/tutorial/interestingPhrases/read-me.html



Edgar

> -----Oorspronkelijk bericht-----
> Van: thanh nguyen [mailto:ngay01032006@yahoo.com.vn] 
> Verzonden: Wednesday, March 22, 2006 6:31 PM
> Aan: java-user@lucene.apache.org
> Onderwerp: Repeat Second time: Extract important terms by 
> programming??
> 
> Can anyone help me?
> 
> 
> 
> 	
> 
> 
> 	
> 		
> ________________________________________________________
> Bạn có sử dụng Yahoo! không? 
> Hãy xem thử trang chủ Yahoo! Việt Nam! 
> http://vn.yahoo.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message