Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 72418 invoked from network); 14 Jun 2007 17:30:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 14 Jun 2007 17:30:03 -0000 Received: (qmail 53122 invoked by uid 500); 14 Jun 2007 17:29:58 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 53058 invoked by uid 500); 14 Jun 2007 17:29:58 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 53026 invoked by uid 99); 14 Jun 2007 17:29:58 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jun 2007 10:29:58 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [128.218.33.11] (HELO gossamer.ckm.ucsf.edu) (128.218.33.11) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jun 2007 10:29:52 -0700 Received: from CKMPC08 (tikal.library.ucsf.edu [128.218.15.151]) by gossamer.ckm.ucsf.edu (8.13.8/8.13.8) with ESMTP id l5EHTTof002268 for ; Thu, 14 Jun 2007 10:29:32 -0700 (PDT) From: "Renaud Waldura" To: Subject: RE: Wildcard query with untokenized punctuation (again) Date: Thu, 14 Jun 2007 10:29:24 -0700 Message-ID: <000901c7aea9$9011f330$970fda80@libraria.ucsf.edu> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <467145FE.5070602@gmail.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 Thread-Index: AceuiiOjy5zg/PqmTAC4NvKoxuxohQAHMbAg X-Filter-Version: 1.15 (gossamer) X-Virus-Checked: Checked by ClamAV on apache.org Thanks guys, I like it! I'm already applying some regexps before query parsing anyway, so it's just another pass. Now, I'm not sure how to do that without breaking another QP feature that I kind of like: the query <> is parsed to PhraseQuery("smith ann"). And that seems right, from a user standpoint. In fact, considering this, I realize <> should be parsed to MultiPhraseQuery("smith", "ann*"), not <<+smith +ann*>> as I said earlier. Brrrr. Getting hairy. Any hope? --Renaud -----Original Message----- From: Mark Miller [mailto:markrmiller@gmail.com] Sent: Thursday, June 14, 2007 6:43 AM To: java-user@lucene.apache.org Subject: Re: Wildcard query with untokenized punctuation (again) Gotto agree with Erick here...best idea is just to preprocess the query before sending it to the QueryParser. My first thought is always to get out the sledgehammer... - Mark Erick Erickson wrote: > Well, perhaps the simplest thing would be to pre-process the query and > make the comma into a whitespace before sending anything to the query > parser. I don't know how generalizable that sort of solution is in > your problem space though.... > > Best > Erick > > On 6/13/07, Renaud Waldura wrote: >> >> My very simple analyzer produces tokens made of digits and/or letters >> only. >> Anything else is discarded. E.g. the input "smith,anna" gets >> tokenized as >> 2 >> tokens, first "smith" then "anna". >> >> Say I have indexed documents that contained both "smith,anna" and >> "smith,annanicole". To find them, I enter the query <>. >> The stock Lucene 2.0 query parser produces a PrefixQuery for the >> single token "smith,ann". This token doesn't exist in my index, and I >> don't get a match. >> >> I have found some references to this: >> >> http://www.nabble.com/Wildcard-query-with-untokenized-punctuation-tf3 >> 378386 >> >> . >> html >> but I don't understand how I can fix it. Comma-separated terms like >> this can appear in any field; I don't think I can create an >> untokenized field. >> >> Really what I would like in this case is for the comma to be >> considered whitespace, and the query to be parsed to <<+smith >> +ann*>>. Any way I can do that? >> >> --Renaud >> >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org