Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 24106 invoked from network); 13 May 2004 21:49:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 13 May 2004 21:49:19 -0000 Received: (qmail 33511 invoked by uid 500); 13 May 2004 21:49:21 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 33443 invoked by uid 500); 13 May 2004 21:49:20 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 33385 invoked by uid 98); 13 May 2004 21:49:20 -0000 Received: from otis_gospodnetic@yahoo.com by hermes.apache.org by uid 82 with qmail-scanner-1.20 (clamuko: 0.70. Clear:RC:0(216.136.173.240):. Processed in 0.543464 secs); 13 May 2004 21:49:20 -0000 X-Qmail-Scanner-Mail-From: otis_gospodnetic@yahoo.com via hermes.apache.org X-Qmail-Scanner: 1.20 (Clear:RC:0(216.136.173.240):. Processed in 0.543464 secs) Received: from unknown (HELO web12703.mail.yahoo.com) (216.136.173.240) by hermes.apache.org with SMTP; 13 May 2004 21:49:19 -0000 Message-ID: <20040513214846.21584.qmail@web12703.mail.yahoo.com> Received: from [83.131.112.218] by web12703.mail.yahoo.com via HTTP; Thu, 13 May 2004 14:48:46 PDT Date: Thu, 13 May 2004 14:48:46 -0700 (PDT) From: Otis Gospodnetic Subject: Re: DO NOT REPLY [Bug 28960] New: - Add "an" to the English stop words To: Lucene Developers List In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam-Rating: hermes.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Yeah, I think that would cause problems for some people. I'm for closing that bug and maybe even just removing all the stop words from the Lucene core, so people don't rely on them, as they are really more for a demo and should not be done. Otis --- Erik Hatcher wrote: > I don't mind adding "an" to the list, but should we be concerned > about > any backwards compatibility issues with this change? > > Erik > > > On May 13, 2004, at 2:05 PM, bugzilla@apache.org wrote: > > > DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG > > RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT > > . > > ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND > > INSERTED IN THE BUG DATABASE. > > > > http://issues.apache.org/bugzilla/show_bug.cgi?id=28960 > > > > Add "an" to the English stop words > > > > Summary: Add "an" to the English stop words > > Product: Lucene > > Version: unspecified > > Platform: PC > > OS/Version: Windows NT/2K > > Status: NEW > > Severity: Minor > > Priority: Other > > Component: Analysis > > AssignedTo: lucene-dev@jakarta.apache.org > > ReportedBy: ats37@hotmail.com > > > > > > In org.apache.lucene.analysis.StopAnalyzer, the ENGLISH_STOP_WORDS > > array > > contains "a" but not "an". So searching for "a fund" will get the > > same hits as > > "fund", but searching for "an investment" will get many more hits > than > > "investment". > > > > This is true in the latest revision of the file, but appears to > have > > always been > > the case. I'm amazed nobody's pointed it out before now, our users > > > had only > > been testing for a few hours before they complained about it :-) > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org > > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org