lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Soeren Pekrul <soeren.pek...@gmx.de>
Subject Re: How to get Term Weights (document term matrix)?
Date Fri, 03 Nov 2006 23:52:23 GMT
Chris Hostetter wrote:
> I don't really know what a "term matrix" is, but when you ask about
> "weight' is it possible you are just looking for the TermDoc.freq() of the
> term/doc pair?

Thank you Chris,

that was also my first idea. I wanted to get the document frequency
	indexreader.docFreq(term)
and the term frequency
	termdoc.freq()
to calculate the term weight by my self.
If I change the scoring by sub classing the Similarity class I have to 
change the code for the term weight calculation as well. The better way 
would be to use the same scoring engine for a single term weight and the 
ranking of search results.

It seems that there is no simple function to ask the weight for a term 
in a document directly. So I decide not to iterate the documents of a 
term or the terms of a document. I'm iterating the terms of the index, 
searching for the term, iterating the result documents and using the 
score as my term weight for the document term matrix:

TermEnum terms=indexreader.terms();
while(terms.next()) {
   Term term=terms.term();
   // write the term to the external document term matrix
   Hits hits=indexsearcher.search(new TermQuery(term));
   for(int i=0; i<hits.length(); i++) {
     Document doc=hits.doc(i);
     // write the document id (key, URL or index number) to the document 
term matrix
     float weight=hits.score(i);
     // write the term weight to the document term matrix
   }
}

Sören

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message