lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Wei Zhu <moonshot...@gmail.com>
Subject Re: n-gram and multiword query
Date Thu, 14 Jul 2005 15:51:42 GMT
hi, munavalli, 
  for the (1), (2), (3), it seems only proximity could solve this
problem.  and for (4), lucene has consider it with coordinate time of
a document.

in my idea, you are partially right for Proximarity search, since
proximity consider the sequence of terms at the same time.

On 7/14/05, Rajesh Munavalli <rajeshm@dessci.com> wrote:
>  What if my intention was to find all three words in a document not
> necessarily in one sentence? Here is my goal
> 
> (1) All three words appearing together should be given Rank 1
> (2) Three words appearing somewhere in the sentence given Rank 2
> (3) Documents containing words in different sentences should be given
> Rank 3
> (4) Documents missing one or more of query terms should be given Rank 4
> 
> Correct me if I am wrong... Proximity search is concerned about query
> terms appearing closer to one another within a certain distance in the
> document.
> 
> Thanks,
> 
> Rajesh Munavalli
> 
> -----Original Message-----
> From: Chen Wei Zhu [mailto:moonshotter@gmail.com]
> Sent: Thursday, July 14, 2005 10:40 AM
> To: general@lucene.apache.org
> Subject: Re: n-gram and multiword query
> 
> i remember lucene doesn't do anything for proximity.
> 
> On 7/14/05, Rajesh Munavalli <rajeshm@dessci.com> wrote:
> > Consider a document with the following contents " Levenshtein distance
> 
> > is named after the Russian scientist Vladimir Levenshtein and is also
> > called edit distance"
> >
> > Possible bi-grams are (after removing the stop words in the beginning
> > and end) "Levenshtein distance", "named after", "Russian scientist",
> > "scientist Vladimir", "Vladimir Levenshtein" called edit", "edit
> > distance"
> >
> > If my query term is "Vladimir levenshtein distance", how does Lucene
> > compute the similarity to the indexed terms? Are query terms appearing
> 
> > together given more importance? How does it account for gaps (caused
> > by stop word removal) while matching multiword query?
> >
> > thanks,
> >
> > Rajesh Munavalli
> >
> >
> 
> 
> --
> Thanks!
> yours, WeiZhu Chen
> 


-- 
Thanks!
yours, WeiZhu Chen

Mime
View raw message