lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Re:RE: Does the string "Cla$$War" affect Lucene?
Date Tue, 14 Aug 2012 16:52:57 GMT
Please read my answer posted before, it explains exactly what happens - so
you can imagine what type of search input produces this. If you want to
change the behavior rethink your tokenization.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: zhoucheng2008 [mailto:zhoucheng2008@gmail.com]
> Sent: Tuesday, August 14, 2012 6:46 PM
> To: java-user
> Subject: Re: Re:RE: Does the string "Cla$$War" affect Lucene?
> 
> Another phrase "$FREE.99" causes the same problem.
> 
> 
> What are the ultimate solutions? How many cases would cause this problem?
> 
> 
> Thanks
> 
> 
> 
> 
> ------------------ Original ------------------
> From:  "dyzc2010  "<1393975679@qq.com>;
> Date:  Tue, Aug 14, 2012 11:27 PM
> To:  "java-user"<java-user@lucene.apache.org>;
> 
> Subject:  Re: Re:RE: Does the string "Cla$$War" affect Lucene?
> 
> 
> 
> I know the reason of no hits.
> 
> 
> Without configuring autoGeneratePhraseQueries, a term like "I love you" is
> split into "I", "love", and "you", therefore getting quite a lot hits.
> 
> 
> On the contrary, the term is not split, and no hits.
> 
> 
> 
> 
> ------------------ Original ------------------
> From:  "Jack Krupansky"<jack@basetechnology.com>;
> Date:  Tue, Aug 14, 2012 11:01 PM
> To:  "java-user"<java-user@lucene.apache.org>;
> 
> Subject:  Re: Re:RE: Does the string "Cla$$War" affect Lucene?
> 
> 
> 
> Try enclosing "Cla$$War" in quotes, which should have the same effect as
> turning on auto-phrase query generation.
> 
> qp.parse("\"Cla$$War\"")
> 
> (You only need to use "escape" for characters which are query syntax
> characters.)
> 
> And do a q.toString to see how the term was analyzed.
> 
> I'm surprised that you got no hits with autoGeneratePhraseQueries - which
> suggests that maybe the index didn't use the same analyzer or maybe the
> literal text in the title is not exactly what you think it is.
> 
> You could use the WhitespaceAnalyzer, but that would leave leading and
> trailing punctuation.
> 
> -- Jack Krupansky
> 
> -----Original Message-----
> From: zhoucheng2008
> Sent: Tuesday, August 14, 2012 10:42 AM
> To: java-user
> Subject: Re:RE: Does the string "Cla$$War" affect Lucene?
> 
> Sound like some other analyzer can do the trick?
> 
> 
> Anyway, I don't want a slower lucene, and I want to treat "Cla$$War" as a
> whole word.
> 
> 
> What is the solution left?
> 
> 
> Thanks.
> 
> 
> 
> 
> ------------------ Original ------------------
> From:  "Uwe Schindler"<uwe@thetaphi.de>;
> Date:  Tue, Aug 14, 2012 04:56 PM
> To:  "java-user"<java-user@lucene.apache.org>;
> 
> Subject:  RE: Does the string "Cla$$War" affect Lucene?
> 
> 
> 
> Hi,
> 
> If you are using StandardAnalyzer, then "Cla$$War" is split at the $
signs,
> so it searches for two tokens, "cla" and "war". If autogenerate phrase
> queries is enabled for QueryParser, it will then create a phrase query
"cla
> war" out of it, which is slower because positions are involved. If
> autogenerate phrases is not enabled, Lucene still have to search for 2
> terms, so it might get slower, if "cla" or "war" hit many documents. If it
> is enabled or not depends on the matchVersion parameter passed to ctor:
> http://lucene.apache.org/core/3_6_1/api/core/org/apache/lucene/queryParser
> /Q
> ueryParser.html
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
> > -----Original Message-----
> > From: Ian Lea [mailto:ian.lea@gmail.com]
> > Sent: Tuesday, August 14, 2012 10:39 AM
> > To: java-user@lucene.apache.org
> > Subject: Re: Does the string "Cla$$War" affect Lucene?
> >
> > Sounds extremely unlikely.  What is the query?  What analyzer? What
> version of
> > lucene?  What about other strings containing $$?
> >
> >
> > --
> > Ian.
> >
> >
> > On Tue, Aug 14, 2012 at 9:13 AM, zhoucheng2008
> > <zhoucheng2008@gmail.com> wrote:
> > > Hi,
> > >
> > >
> > > I have a big index, and when I searched it with a title string
> "Cla$$War",
> > Lucene became very slow. It doesn't happen when I searched with other
> title
> > string such as "Gone with Wind". Does the "$$" affect the search
> performance?
> > >
> > >
> > > Thanks,
> > > Cheng
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message