lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Munavalli" <raje...@dessci.com>
Subject RE: n-gram and multiword query
Date Thu, 14 Jul 2005 15:48:35 GMT
 What if my intention was to find all three words in a document not
necessarily in one sentence? Here is my goal

(1) All three words appearing together should be given Rank 1
(2) Three words appearing somewhere in the sentence given Rank 2
(3) Documents containing words in different sentences should be given
Rank 3
(4) Documents missing one or more of query terms should be given Rank 4

Correct me if I am wrong... Proximity search is concerned about query
terms appearing closer to one another within a certain distance in the
document.

Thanks,

Rajesh Munavalli

-----Original Message-----
From: Chen Wei Zhu [mailto:moonshotter@gmail.com] 
Sent: Thursday, July 14, 2005 10:40 AM
To: general@lucene.apache.org
Subject: Re: n-gram and multiword query

i remember lucene doesn't do anything for proximity.

On 7/14/05, Rajesh Munavalli <rajeshm@dessci.com> wrote:
> Consider a document with the following contents " Levenshtein distance

> is named after the Russian scientist Vladimir Levenshtein and is also 
> called edit distance"
> 
> Possible bi-grams are (after removing the stop words in the beginning 
> and end) "Levenshtein distance", "named after", "Russian scientist", 
> "scientist Vladimir", "Vladimir Levenshtein" called edit", "edit 
> distance"
> 
> If my query term is "Vladimir levenshtein distance", how does Lucene 
> compute the similarity to the indexed terms? Are query terms appearing

> together given more importance? How does it account for gaps (caused 
> by stop word removal) while matching multiword query?
> 
> thanks,
> 
> Rajesh Munavalli
> 
> 


--
Thanks!
yours, WeiZhu Chen

Mime
View raw message