lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhoucheng2008" <zhoucheng2...@gmail.com>
Subject Re: Re:RE: Does the string "Cla$$War" affect Lucene?
Date Tue, 14 Aug 2012 16:46:27 GMT
Another phrase "$FREE.99" causes the same problem.


What are the ultimate solutions? How many cases would cause this problem?


Thanks




------------------ Original ------------------
From:  "dyzc2010  "<1393975679@qq.com>;
Date:  Tue, Aug 14, 2012 11:27 PM
To:  "java-user"<java-user@lucene.apache.org>; 

Subject:  Re: Re:RE: Does the string "Cla$$War" affect Lucene?



I know the reason of no hits.


Without configuring autoGeneratePhraseQueries, a term like "I love you" is split into "I",
"love", and "you", therefore getting quite a lot hits.


On the contrary, the term is not split, and no hits.




------------------ Original ------------------
From:  "Jack Krupansky"<jack@basetechnology.com>;
Date:  Tue, Aug 14, 2012 11:01 PM
To:  "java-user"<java-user@lucene.apache.org>; 

Subject:  Re: Re:RE: Does the string "Cla$$War" affect Lucene?



Try enclosing "Cla$$War" in quotes, which should have the same effect as 
turning on auto-phrase query generation.

qp.parse("\"Cla$$War\"")

(You only need to use "escape" for characters which are query syntax 
characters.)

And do a q.toString to see how the term was analyzed.

I'm surprised that you got no hits with autoGeneratePhraseQueries - which 
suggests that maybe the index didn't use the same analyzer or maybe the 
literal text in the title is not exactly what you think it is.

You could use the WhitespaceAnalyzer, but that would leave leading and 
trailing punctuation.

-- Jack Krupansky

-----Original Message----- 
From: zhoucheng2008
Sent: Tuesday, August 14, 2012 10:42 AM
To: java-user
Subject: Re:RE: Does the string "Cla$$War" affect Lucene?

Sound like some other analyzer can do the trick?


Anyway, I don't want a slower lucene, and I want to treat "Cla$$War" as a 
whole word.


What is the solution left?


Thanks.




------------------ Original ------------------
From:  "Uwe Schindler"<uwe@thetaphi.de>;
Date:  Tue, Aug 14, 2012 04:56 PM
To:  "java-user"<java-user@lucene.apache.org>;

Subject:  RE: Does the string "Cla$$War" affect Lucene?



Hi,

If you are using StandardAnalyzer, then "Cla$$War" is split at the $ signs,
so it searches for two tokens, "cla" and "war". If autogenerate phrase
queries is enabled for QueryParser, it will then create a phrase query "cla
war" out of it, which is slower because positions are involved. If
autogenerate phrases is not enabled, Lucene still have to search for 2
terms, so it might get slower, if "cla" or "war" hit many documents. If it
is enabled or not depends on the matchVersion parameter passed to ctor:
http://lucene.apache.org/core/3_6_1/api/core/org/apache/lucene/queryParser/Q
ueryParser.html

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Ian Lea [mailto:ian.lea@gmail.com]
> Sent: Tuesday, August 14, 2012 10:39 AM
> To: java-user@lucene.apache.org
> Subject: Re: Does the string "Cla$$War" affect Lucene?
>
> Sounds extremely unlikely.  What is the query?  What analyzer? What
version of
> lucene?  What about other strings containing $$?
>
>
> --
> Ian.
>
>
> On Tue, Aug 14, 2012 at 9:13 AM, zhoucheng2008
> <zhoucheng2008@gmail.com> wrote:
> > Hi,
> >
> >
> > I have a big index, and when I searched it with a title string
"Cla$$War",
> Lucene became very slow. It doesn't happen when I searched with other
title
> string such as "Gone with Wind". Does the "$$" affect the search
performance?
> >
> >
> > Thanks,
> > Cheng
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message