lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Vector Space Model in Lucene?
Date Fri, 14 Nov 2003 19:48:59 GMT

On Friday, November 14, 2003, at 02:32  PM, Chong, Herb wrote:
> when people type in multiword queries, mostly they are interested in 
> phrases in the linguistic sense. phrases don't cross sentence 
> boundaries. you need certain features in the index and in the ranking 
> algorithm to capture that distinction and rank documents truly having 
> that phrase higher than documents that just happen to have the same 
> words as the phrase. it also has to accommodate the human tendency to 
> leave off words after mentioning the full form of the phrase once.
> Herb....

In the Lucene-sense of things, sounds like you're after one Document 
per sentence.  You then get your boundaries automatically as well as 
the "distance weighting" through the coord() Similarity function.  At 
least that seems like a close approximation of what Lucene offers.  


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message