lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Munavalli" <>
Subject RE: n-gram and multiword query
Date Thu, 14 Jul 2005 15:48:35 GMT
 What if my intention was to find all three words in a document not
necessarily in one sentence? Here is my goal

(1) All three words appearing together should be given Rank 1
(2) Three words appearing somewhere in the sentence given Rank 2
(3) Documents containing words in different sentences should be given
Rank 3
(4) Documents missing one or more of query terms should be given Rank 4

Correct me if I am wrong... Proximity search is concerned about query
terms appearing closer to one another within a certain distance in the


Rajesh Munavalli

-----Original Message-----
From: Chen Wei Zhu [] 
Sent: Thursday, July 14, 2005 10:40 AM
Subject: Re: n-gram and multiword query

i remember lucene doesn't do anything for proximity.

On 7/14/05, Rajesh Munavalli <> wrote:
> Consider a document with the following contents " Levenshtein distance

> is named after the Russian scientist Vladimir Levenshtein and is also 
> called edit distance"
> Possible bi-grams are (after removing the stop words in the beginning 
> and end) "Levenshtein distance", "named after", "Russian scientist", 
> "scientist Vladimir", "Vladimir Levenshtein" called edit", "edit 
> distance"
> If my query term is "Vladimir levenshtein distance", how does Lucene 
> compute the similarity to the indexed terms? Are query terms appearing

> together given more importance? How does it account for gaps (caused 
> by stop word removal) while matching multiword query?
> thanks,
> Rajesh Munavalli

yours, WeiZhu Chen

View raw message