Return-Path: Delivered-To: apmail-incubator-lucene-net-user-archive@locus.apache.org Received: (qmail 98022 invoked from network); 8 Sep 2006 14:55:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 8 Sep 2006 14:55:11 -0000 Received: (qmail 32535 invoked by uid 500); 8 Sep 2006 14:55:11 -0000 Delivered-To: apmail-incubator-lucene-net-user-archive@incubator.apache.org Received: (qmail 32515 invoked by uid 500); 8 Sep 2006 14:55:11 -0000 Mailing-List: contact lucene-net-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucene-net-user@incubator.apache.org Delivered-To: mailing list lucene-net-user@incubator.apache.org Delivered-To: moderator for lucene-net-user@incubator.apache.org Received: (qmail 15948 invoked by uid 99); 8 Sep 2006 14:50:14 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on javelina X-Spam-Level: Mime-Version: 1.0 (Apple Message framework v752.2) In-Reply-To: <33E82CFBA9DB07478509F1859715ECF403AD15BF@mapibe05.exchange.xchg> References: <33E82CFBA9DB07478509F1859715ECF403AD15BF@mapibe05.exchange.xchg> Content-Type: text/plain; charset=WINDOWS-1252; delsp=yes; format=flowed Message-Id: <74471171-5BCF-4A1C-BA4B-B586AF92E40E@ehatchersolutions.com> Content-Transfer-Encoding: quoted-printable From: Erik Hatcher Subject: Re: Phrase Matching Date: Fri, 8 Sep 2006 10:49:46 -0400 To: lucene-net-user@incubator.apache.org X-Mailer: Apple Mail (2.752.2) X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.1.1 X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N I recommend you use an analyzer during indexing and with QueryParser =20 that does not remove stop words and then all will be well with this =20 particular query. As for your index size, don't be too concerned =20 with that at the moment. I suspect it will be under control as long =20 as you are careful with what fields you store. Erik On Sep 8, 2006, at 9:28 AM, Harris, Tobin wrote: > Hi Folks, > > > > We want to search our lucene index for an exact phrase that is: > > > > =93t in the park=94 > > > > Note: the resulting lucene query string is something like: > > > > body:(=93t in the park=94) > > > > However, Lucene uses the default stop word list and therefore =20 > translates this phrase to simply =93park=94. This gets a LOT of = matches =20 > of course :-) > > > > Any idea how would I set up Lucene so that we can search for =20 > phrases in this way? I=92m concerned about removing stop words since =20= > it may cause the index to grow huge (we currently add 60,000 items =20 > to our index per day). > > > > Any help mucho appreciated. > > > > Thanks > > > > Tobin > > -~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~~-~~-~~-~~-~- > Socena Ltd > Software Development Services > HYPERLINK "http://www.socena.com/"www.socena.com > > t: +44 113 2179134 f: +44 870 762 6678 > w: HYPERLINK "http://www.tobinharris.com/"www.tobinharris.com e: =20= > HYPERLINK "mailto:tobin@tobinharris.com"HYPERLINK =20 > "mailto:tobin@tobinharris.com"tobin@tobinharris.com > > s: tobinharris > > 35 Kirkstall Avenue, Leeds, LS5 3DW, UK > -~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~~-~~-~~-~~-~- > > > > > --=20 > No virus found in this outgoing message. > Checked by AVG Free Edition. > Version: 7.1.405 / Virus Database: 268.12.2/441 - Release Date: =20 > 07/09/2006 >