lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: How to get Term Weights (document term matrix)?
Date Sat, 04 Nov 2006 00:13:21 GMT

: It seems that there is no simple function to ask the weight for a term
: in a document directly. So I decide not to iterate the documents of a

as i said: it depends on what you mean by "term weight" ...

: term or the terms of a document. I'm iterating the terms of the index,
: searching for the term, iterating the result documents and using the
: score as my term weight for the document term matrix:

...okay, so it sounds like your defining term weight of a doc/term to be
the score of that document when searching for that term.

You really, *REALLY* don't wnat to be doing this using the "Hits" class
like in your example ...
   1) this will re-execute your search behind the scenes many many times
   2) the scores returnd by "Hits" are psuedo-normalized ... they will be
      meaningless for any sort of comparison.

if your concern is making sure that the score you get back matches the
score you would get from executing a search even if you change the
Similarity, you could just make sure you use the lengthNorm and tf
functions from the SImilarity class just like TermScorer does ... or you
could keep executing a TermQuery for each term like you are now, but using
a HitCollector so you get the raw score)

take a look at the Searcher.search methods that take in a HitCollector.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message