lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike O'Leary" <tmole...@uw.edu>
Subject Lucene's use of vectors
Date Thu, 01 Mar 2012 23:15:23 GMT
In the Javadoc page for the Similarity class, it says,

"Lucene combines Boolean model (BM) of Information Retrieval with Vector Space Model (VSM)
of Information Retrieval - documents "approved" by BM are scored by VSM."

Is the Vector Space Model that is referred to here different than the term vectors that can
optionally be stored in index fields? It sounds like the vector space model is used by Lucene
in all cases in order to determine ranking of returned results, not only when indexing with
term vectors is enabled. If you have indexed without term vectors, what does Lucene use to
score "approved" documents? And if you have indexed with term vectors, what does that enable
you to do that you couldn't do with an index without term vectors?

Is there a kind of search in Lucene in which documents are "approved" by VSM as well as scored
by them, or does that even make sense? I understand how similarity works when comparing two
documents, but I can't imagine that it would work to search by comparing a term vector from
a set of search terms against each of the term vectors in an index one at a time. Is there
a more efficient way of searching using a term vector of search terms - other than using its
terms in a Boolean search that is?

I am asking because my boss asked me what all of the ways that Lucene uses vectors in indexing
and search were, and my answer revealed a lot of gaps in my understanding of it.
Thanks,
Mike

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message