lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gorka Naveira" <>
Subject Lucene Vector Model
Date Thu, 14 Dec 2006 11:08:09 GMT
I'm working on Lucene's vector model, and it's way of scoring, and I have
some doubts.
As I think Lucene introduces terms (DocumentWriter.addPosition, using
Postings) in index with some information,
such as offset, document number and term frequency.

I would like  to  apply to each term another way of vectoring, associating
IDF (inverse document frequency),
BIDF (boolean IDF) or WIDF to each term, but it means that we have to take
all documents in order to get IDF,
and as I see Lucene introduces docs one on one, without comparison between

I know Lucene uses IDF but only for searches (a posteriori) and taking just
a filtered set of document, not the whole set.

My questions are:
¿It's all I've seen correct?
¿It's possible to make the changes I need (for a non-expert in Lucene )?
¿It's something relationed made before?

Thank you in advance, Gorka Naveira

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message