Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1E37CD4C1 for ; Tue, 14 Aug 2012 16:53:19 +0000 (UTC) Received: (qmail 35224 invoked by uid 500); 14 Aug 2012 16:53:16 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 35184 invoked by uid 500); 14 Aug 2012 16:53:16 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 35176 invoked by uid 99); 14 Aug 2012 16:53:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Aug 2012 16:53:16 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FSL_RCVD_USER,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of uwe@thetaphi.de designates 188.138.97.18 as permitted sender) Received: from [188.138.97.18] (HELO mail.sd-datasolutions.de) (188.138.97.18) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Aug 2012 16:53:11 +0000 Received: from VEGA (port-92-196-51-162.dynamic.qsc.de [92.196.51.162]) by mail.sd-datasolutions.de (Postfix) with ESMTPSA id 71CB914AA070 for ; Tue, 14 Aug 2012 16:52:50 +0000 (UTC) From: "Uwe Schindler" To: References: In-Reply-To: Subject: RE: Re:RE: Does the string "Cla$$War" affect Lucene? Date: Tue, 14 Aug 2012 18:52:57 +0200 Message-ID: <018601cd7a3d$4192dcf0$c4b896d0$@thetaphi.de> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQIw/Wi5nCx1Quptt1q7R0FNT8E0kJaSf2gg Content-Language: de X-Virus-Checked: Checked by ClamAV on apache.org Please read my answer posted before, it explains exactly what happens - so you can imagine what type of search input produces this. If you want to change the behavior rethink your tokenization. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de > -----Original Message----- > From: zhoucheng2008 [mailto:zhoucheng2008@gmail.com] > Sent: Tuesday, August 14, 2012 6:46 PM > To: java-user > Subject: Re: Re:RE: Does the string "Cla$$War" affect Lucene? > > Another phrase "$FREE.99" causes the same problem. > > > What are the ultimate solutions? How many cases would cause this problem? > > > Thanks > > > > > ------------------ Original ------------------ > From: "dyzc2010 "<1393975679@qq.com>; > Date: Tue, Aug 14, 2012 11:27 PM > To: "java-user"; > > Subject: Re: Re:RE: Does the string "Cla$$War" affect Lucene? > > > > I know the reason of no hits. > > > Without configuring autoGeneratePhraseQueries, a term like "I love you" is > split into "I", "love", and "you", therefore getting quite a lot hits. > > > On the contrary, the term is not split, and no hits. > > > > > ------------------ Original ------------------ > From: "Jack Krupansky"; > Date: Tue, Aug 14, 2012 11:01 PM > To: "java-user"; > > Subject: Re: Re:RE: Does the string "Cla$$War" affect Lucene? > > > > Try enclosing "Cla$$War" in quotes, which should have the same effect as > turning on auto-phrase query generation. > > qp.parse("\"Cla$$War\"") > > (You only need to use "escape" for characters which are query syntax > characters.) > > And do a q.toString to see how the term was analyzed. > > I'm surprised that you got no hits with autoGeneratePhraseQueries - which > suggests that maybe the index didn't use the same analyzer or maybe the > literal text in the title is not exactly what you think it is. > > You could use the WhitespaceAnalyzer, but that would leave leading and > trailing punctuation. > > -- Jack Krupansky > > -----Original Message----- > From: zhoucheng2008 > Sent: Tuesday, August 14, 2012 10:42 AM > To: java-user > Subject: Re:RE: Does the string "Cla$$War" affect Lucene? > > Sound like some other analyzer can do the trick? > > > Anyway, I don't want a slower lucene, and I want to treat "Cla$$War" as a > whole word. > > > What is the solution left? > > > Thanks. > > > > > ------------------ Original ------------------ > From: "Uwe Schindler"; > Date: Tue, Aug 14, 2012 04:56 PM > To: "java-user"; > > Subject: RE: Does the string "Cla$$War" affect Lucene? > > > > Hi, > > If you are using StandardAnalyzer, then "Cla$$War" is split at the $ signs, > so it searches for two tokens, "cla" and "war". If autogenerate phrase > queries is enabled for QueryParser, it will then create a phrase query "cla > war" out of it, which is slower because positions are involved. If > autogenerate phrases is not enabled, Lucene still have to search for 2 > terms, so it might get slower, if "cla" or "war" hit many documents. If it > is enabled or not depends on the matchVersion parameter passed to ctor: > http://lucene.apache.org/core/3_6_1/api/core/org/apache/lucene/queryParser > /Q > ueryParser.html > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: uwe@thetaphi.de > > > > -----Original Message----- > > From: Ian Lea [mailto:ian.lea@gmail.com] > > Sent: Tuesday, August 14, 2012 10:39 AM > > To: java-user@lucene.apache.org > > Subject: Re: Does the string "Cla$$War" affect Lucene? > > > > Sounds extremely unlikely. What is the query? What analyzer? What > version of > > lucene? What about other strings containing $$? > > > > > > -- > > Ian. > > > > > > On Tue, Aug 14, 2012 at 9:13 AM, zhoucheng2008 > > wrote: > > > Hi, > > > > > > > > > I have a big index, and when I searched it with a title string > "Cla$$War", > > Lucene became very slow. It doesn't happen when I searched with other > title > > string such as "Gone with Wind". Does the "$$" affect the search > performance? > > > > > > > > > Thanks, > > > Cheng > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org