lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: TermVector usage
Date Tue, 21 Feb 2006 13:53:57 GMT

On Feb 20, 2006, at 9:47 PM, Otis Gospodnetic wrote:

> As far as I can tell, most people use TermVectors for "more like  
> this" queries (see MoreLikeThis class in contrib/ somewhere)

On Feb 21, 2006, at 5:39 AM, Erik Hatcher wrote:
> I use term vectors for "more like this" queries, such as the links  
> you'll see here:
>
> 	<http://www.rossettiarchive.org/rose/?query=%2B%28%2Bblessed+% 
> 2Bdamozel%29+%2B%28archivetype%3Arad%29>

Thanks, Otis and Erik.  (MoreLikeThis is under contrib/similarity.)   
Looking at the way MoreLikeThis is implemented, my impression is that  
it wouldn't hurt and might help a smidge to store the term vector  
with the stored document.

What I don't yet see is a benefit to having all TermVectors reside  
side-by-side in the same file.  A full vector-space search which  
compares complete document vectors and thus needs to scan through all  
TermVectors for each query is the only application I've thought of so  
far.  Of course such a beast is impractical for a search engine of  
any reasonable size, so you need some method of data reduction.   
LSI's decomposition is one way of hacking at that problem, but you  
don't do that on the fly at search-time. :)  Another is the heuristic  
process applied by the MoreLikeThis class, but MoreLikeThis only  
needs a single document's TermVectors.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message