Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 36055 invoked from network); 16 Jun 2005 14:24:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 16 Jun 2005 14:24:21 -0000 Received: (qmail 40509 invoked by uid 500); 16 Jun 2005 14:24:14 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 40468 invoked by uid 500); 16 Jun 2005 14:24:14 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 40448 invoked by uid 99); 16 Jun 2005 14:24:14 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from ehatchersolutions.com (HELO ehatchersolutions.com) (69.55.225.129) by apache.org (qpsmtpd/0.28) with ESMTP; Thu, 16 Jun 2005 07:24:09 -0700 Received: by ehatchersolutions.com (Postfix, from userid 504) id 35AE513E2007; Thu, 16 Jun 2005 10:23:44 -0400 (EDT) Received: from [128.143.167.108] (d-128-167-108.bootp.Virginia.EDU [128.143.167.108]) by ehatchersolutions.com (Postfix) with ESMTP id C0F9B13E2005 for ; Thu, 16 Jun 2005 10:22:52 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v730) In-Reply-To: <42B17724.2060002@cos.com> References: <42B05381.9060409@cos.com> <42B17724.2060002@cos.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Erik Hatcher Subject: Re: QueryParser, phrases and stopwords Date: Thu, 16 Jun 2005 10:23:07 -0400 To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.730) X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on javelina X-Spam-Status: No, score=-2.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.1 X-Spam-Level: X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Are there any other issues or concerns with making this change to StopFilter? Should we make this change in 1.9? Or wait until after 2.0 is released? Mike - if you could create some test cases for this scenario and contribute your patch and tests to Bugzilla, barring no objections, I'll apply it. Erik On Jun 16, 2005, at 8:57 AM, Mike Barry wrote: > Erik, > Thanks, I applied the changes found in version 150148 of > StopFilter.java > and they work great for me. I did remove the setting of position=1 > before > the return of the token since that seemed spurious to me. Here's a > context > diff of the current StopFilter.java and my changes: > > *** analysis/StopFilter.java.old Thu Jun 16 07:42:28 2005 > --- analysis/StopFilter.java Thu Jun 16 08:44:50 2005 > *************** > *** 94,109 **** > * Returns the next input Token whose termText() is not a stop > word. > */ > public final Token next() throws IOException { > - int position = 1; > - > // return the first non-stop word found > ! for (Token token = input.next(); token != null; token = > input.next()) { > ! if (!stopWords.contains(token.termText)) { > ! token.setPositionIncrement( position ); > return token; > - } > - position++; > - } > // reached EOS -- return null > return null; > } > --- 94,103 ---- > * Returns the next input Token whose termText() is not a stop > word. > */ > public final Token next() throws IOException { > // return the first non-stop word found > ! for (Token token = input.next(); token != null; token = > input.next()) > ! if (!stopWords.contains(token.termText)) > return token; > // reached EOS -- return null > return null; > } > > > > > Erik Hatcher wrote: > > >> >> On Jun 15, 2005, at 12:12 PM, Mike Barry wrote: >> >> >>> I have a situation where a query such as "climate control" is >>> returning >>> documents with the phrase "climate of control". (I'm using >>> QueryParser). >>> >>> After searching, I found the similar issue on the mailing list from >>> Greg Robertson >>> with a patch from Steve Rowe. >>> >>> Looking at the source repository for StopFilter.java, the patch was >>> applied >>> in November of 2003 and then reverted in Dec 2003 (by Erik), with >>> the note: >>> >>> revert position increment change due to conflict with PhraseQuery >>> >>> (the patch incremented the token position to inhibit exact >>> matching >>> across >>> removed stopword(s)). >>> >>> I couldn't find any info on how/why this approach conflicted with >>> PhraseQuery. >>> Can anyone elighten me on this? Does anyone know of a way to inhibit >>> exact matching across removed stopwords(s)? >>> >> >> >> PhraseQuery originally did not account for gaps left in the terms of >> the phrase. >> >> PhraseQuery was modified last year to allow for this though: >> >> r150509 | goller | 2004-09-15 05:38:50 -0400 (Wed, 15 Sep 2004) | 5 >> lines >> >> PhraseQuery and PhrasePrefixQuery are extended. It's now >> possible to specify the relative position of a term within >> a phrase. This allows gaps and multiple terms at the same >> position. >> ----- >> >> So we could change StopFilter to put the gaps back in safely now, I >> think. >> >> Thoughts? >> >> Erik >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org