Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 79707 invoked from network); 28 Jun 2010 13:21:45 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 28 Jun 2010 13:21:45 -0000 Received: (qmail 72916 invoked by uid 500); 28 Jun 2010 13:21:43 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 72603 invoked by uid 500); 28 Jun 2010 13:21:39 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 72595 invoked by uid 99); 28 Jun 2010 13:21:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Jun 2010 13:21:39 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of erickerickson@gmail.com designates 74.125.83.176 as permitted sender) Received: from [74.125.83.176] (HELO mail-pv0-f176.google.com) (74.125.83.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Jun 2010 13:21:30 +0000 Received: by pvh1 with SMTP id 1so73759pvh.35 for ; Mon, 28 Jun 2010 06:20:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=E5tVYwYksIsbCnxt6s453JUl3ggAsqNe6uCcwuJ1hZU=; b=iIMZEXE/3TP6zvnOVL63C7JvlFeiAtQgmisLc+v8LooVXUrCJZwcFFpk4D6o7KI3E+ fXxlaXx7efZYiH3+W07iXLXhUHzQnWAf4EKY4IVCHJvxztd+UtNXvN1m+sHKFQfTXNez SXBwUDT6Tl+NVxNizuqTJ9JgTIZLdosUInAHY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=JLVVYeUY+eckdOECFNyO7Rj6CEF4xiLQIQWe/ZqaWtCcZM1jawQX68rottqX0z9N5M 1CgnsJxR7YcOOLwyQhpbgdsERmzby7lx4OMYZXHUP6OAqzNcSMUc6PAgX0sHGvgSPhPF Nmd6N5en/D4hinyhfAkb3Bf2UGlcDZLBeCkpU= MIME-Version: 1.0 Received: by 10.142.2.17 with SMTP id 17mr5848589wfb.76.1277731209031; Mon, 28 Jun 2010 06:20:09 -0700 (PDT) Received: by 10.142.53.11 with HTTP; Mon, 28 Jun 2010 06:20:08 -0700 (PDT) In-Reply-To: References: Date: Mon, 28 Jun 2010 09:20:08 -0400 Message-ID: Subject: Re: A question regarding the setSlop method of class PhraseQuery (Lucene version 3.0.1) From: Erick Erickson To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=00504502ae02d5312e048a16fa0f X-Virus-Checked: Checked by ClamAV on apache.org --00504502ae02d5312e048a16fa0f Content-Type: text/plain; charset=ISO-8859-1 I think you're misunderstanding the intent of PhraseQueries and slop. Slop is the number of intervening tokens that may exist between the words you're looking for. However, all the words you're looking for MUST exist. So, <<< whenever the search phrase contains a word that don't exist in the document, the search result will be empty >>> is exactly how this is intended to work. HTH Erick On Mon, Jun 28, 2010 at 9:09 AM, a peng wrote: > Hi, > > My test result is that whenever the search phrase contains a word that > don't > exist in the document, the search result will be empty no matter how big > the > slop factor I set, seems this is a bug of Lucene, or it is work as design? > > 2010/6/28 tarun sapra > > > Hi , > > > > I think I have been able to understand whats happening here... > > > > Indexed Content : "This is a test". > > your search phrase : "This is a formal test" > > your setting the slop factor 2 , now if your slop factor is 3 it should > > work > > because "is" and "a" are stop words thus the words "This" and "test" are > 2 > > slop factor apart but in your search phrase "This is a formal test" the > > words "This" and "test" are 3 slop factor thats why it's nor working > > now in search phrase "This is formal test" the words "This" and "test" > are > > 2 > > slop factor apart thats why this phrase is working. > > > > > > > > On Mon, Jun 28, 2010 at 11:37 AM, a peng wrote: > > > > > Hi, > > > > > > I am using StandardAnalyzer(Version.LUCENE_30); > > > > > > 2010/6/27 tarun sapra > > > > > > > which analyzer are you usin'? > > > > > > > > > > > > On Sun, Jun 27, 2010 at 7:12 AM, a peng > > wrote: > > > > > > > > > Hi, > > > > > > > > > > I know the indexed content contains the following text: "This is a > > > test". > > > > > And the search phrase I used is "This is a formal test", and then I > > set > > > > the > > > > > slop of the PhraseQuery as 2 with setSlop(2), but I found that I > can > > > not > > > > > get > > > > > a search result. If I set the search phrase as "This is formal > test", > > > > then > > > > > I > > > > > can get the search result. > > > > > > > > > > So what is the problem here, thanks in advance. > > > > > > > > > > > > > > > Attached is the Java doc for the setSlop method: > > > > > > > > > > public void *setSlop*(int s) > > > > > > > > > > Sets the number of other words permitted between words in query > > phrase. > > > > If > > > > > zero, then this is an exact phrase search. For larger values this > > works > > > > > like > > > > > a WITHIN or NEAR operator. > > > > > > > > > > The slop is in fact an edit-distance, where the units correspond to > > > moves > > > > > of > > > > > terms in the query phrase out of position. For example, to switch > the > > > > order > > > > > of two words requires two moves (the first move places the words > atop > > > one > > > > > another), so to permit re-orderings of phrases, the slop must be at > > > least > > > > > two. > > > > > > > > > > More exact matches are scored higher than sloppier matches, thus > > search > > > > > results are sorted by exactness. > > > > > > > > > > The slop is zero by default, requiring exact matches. > > > > > > > > > > > > > > > > > > > > > -- > > > > Thanks & Regards > > > > Tarun Sapra > > > > > > > > > > > > > > > -- > > Thanks & Regards > > Tarun Sapra > > > --00504502ae02d5312e048a16fa0f--