Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 89424 invoked from network); 14 Jul 2005 15:52:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 14 Jul 2005 15:52:32 -0000 Received: (qmail 97489 invoked by uid 500); 14 Jul 2005 15:52:31 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 97476 invoked by uid 500); 14 Jul 2005 15:52:31 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 97462 invoked by uid 99); 14 Jul 2005 15:52:31 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jul 2005 08:52:31 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of moonshotter@gmail.com designates 64.233.184.197 as permitted sender) Received: from [64.233.184.197] (HELO wproxy.gmail.com) (64.233.184.197) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jul 2005 08:52:28 -0700 Received: by wproxy.gmail.com with SMTP id 69so434472wri for ; Thu, 14 Jul 2005 08:52:29 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=duGvFj4hPz/bNl2YOcvKmqpmluLHbj2YCbDTh4qvJCfKbcUXS44viNV9LTVRBu1KTEIKnCkcYkTJppDRplBQYfENaLQn/4JlrjBnrxrP9XX4hwD9AmYk/wADhESZFXAxTAa3PEHJQyeaJ3T9l1A/27K9YGD3g+JNoD+Gm/6o3is= Received: by 10.54.34.19 with SMTP id h19mr766678wrh; Thu, 14 Jul 2005 08:51:42 -0700 (PDT) Received: by 10.54.104.8 with HTTP; Thu, 14 Jul 2005 08:51:42 -0700 (PDT) Message-ID: Date: Thu, 14 Jul 2005 23:51:42 +0800 From: Chen Wei Zhu Reply-To: Chen Wei Zhu To: general@lucene.apache.org Subject: Re: n-gram and multiword query In-Reply-To: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N hi, munavalli,=20 for the (1), (2), (3), it seems only proximity could solve this problem. and for (4), lucene has consider it with coordinate time of a document. in my idea, you are partially right for Proximarity search, since proximity consider the sequence of terms at the same time. On 7/14/05, Rajesh Munavalli wrote: > What if my intention was to find all three words in a document not > necessarily in one sentence? Here is my goal >=20 > (1) All three words appearing together should be given Rank 1 > (2) Three words appearing somewhere in the sentence given Rank 2 > (3) Documents containing words in different sentences should be given > Rank 3 > (4) Documents missing one or more of query terms should be given Rank 4 >=20 > Correct me if I am wrong... Proximity search is concerned about query > terms appearing closer to one another within a certain distance in the > document. >=20 > Thanks, >=20 > Rajesh Munavalli >=20 > -----Original Message----- > From: Chen Wei Zhu [mailto:moonshotter@gmail.com] > Sent: Thursday, July 14, 2005 10:40 AM > To: general@lucene.apache.org > Subject: Re: n-gram and multiword query >=20 > i remember lucene doesn't do anything for proximity. >=20 > On 7/14/05, Rajesh Munavalli wrote: > > Consider a document with the following contents " Levenshtein distance >=20 > > is named after the Russian scientist Vladimir Levenshtein and is also > > called edit distance" > > > > Possible bi-grams are (after removing the stop words in the beginning > > and end) "Levenshtein distance", "named after", "Russian scientist", > > "scientist Vladimir", "Vladimir Levenshtein" called edit", "edit > > distance" > > > > If my query term is "Vladimir levenshtein distance", how does Lucene > > compute the similarity to the indexed terms? Are query terms appearing >=20 > > together given more importance? How does it account for gaps (caused > > by stop word removal) while matching multiword query? > > > > thanks, > > > > Rajesh Munavalli > > > > >=20 >=20 > -- > Thanks! > yours, WeiZhu Chen >=20 --=20 Thanks! yours, WeiZhu Chen