lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dyzc" <1393975...@qq.com>
Subject Re: Re:RE: Does the string "Cla$$War" affect Lucene?
Date Tue, 14 Aug 2012 15:18:52 GMT
I should have made it more clear.


When I said no hits, I referred to no hits by other ordinary term such as "Gone with Wind".


I do analyze the query. When "True" is on for autoGeneratePhraseQueries, the term is parsed
as "cla war" with a space sit between.


When "False", it becomes two phrases: "cla", "war".


I hesitate to use quotes as I have to do so for every other query. That is a cost.


The optimal way may be to use other analyzer as suggested in other response. 


But I don't know which is good substitute for the standard analyzer.




------------------ Original ------------------
From:  "Jack Krupansky"<jack@basetechnology.com>;
Date:  Tue, Aug 14, 2012 11:01 PM
To:  "java-user"<java-user@lucene.apache.org>; 

Subject:  Re: Re:RE: Does the string "Cla$$War" affect Lucene?



Try enclosing "Cla$$War" in quotes, which should have the same effect as 
turning on auto-phrase query generation.

qp.parse("\"Cla$$War\"")

(You only need to use "escape" for characters which are query syntax 
characters.)

And do a q.toString to see how the term was analyzed.

I'm surprised that you got no hits with autoGeneratePhraseQueries - which 
suggests that maybe the index didn't use the same analyzer or maybe the 
literal text in the title is not exactly what you think it is.

You could use the WhitespaceAnalyzer, but that would leave leading and 
trailing punctuation.

-- Jack Krupansky

-----Original Message----- 
From: zhoucheng2008
Sent: Tuesday, August 14, 2012 10:42 AM
To: java-user
Subject: Re:RE: Does the string "Cla$$War" affect Lucene?

Sound like some other analyzer can do the trick?


Anyway, I don't want a slower lucene, and I want to treat "Cla$$War" as a 
whole word.


What is the solution left?


Thanks.




------------------ Original ------------------
From:  "Uwe Schindler"<uwe@thetaphi.de>;
Date:  Tue, Aug 14, 2012 04:56 PM
To:  "java-user"<java-user@lucene.apache.org>;

Subject:  RE: Does the string "Cla$$War" affect Lucene?



Hi,

If you are using StandardAnalyzer, then "Cla$$War" is split at the $ signs,
so it searches for two tokens, "cla" and "war". If autogenerate phrase
queries is enabled for QueryParser, it will then create a phrase query "cla
war" out of it, which is slower because positions are involved. If
autogenerate phrases is not enabled, Lucene still have to search for 2
terms, so it might get slower, if "cla" or "war" hit many documents. If it
is enabled or not depends on the matchVersion parameter passed to ctor:
http://lucene.apache.org/core/3_6_1/api/core/org/apache/lucene/queryParser/Q
ueryParser.html

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Ian Lea [mailto:ian.lea@gmail.com]
> Sent: Tuesday, August 14, 2012 10:39 AM
> To: java-user@lucene.apache.org
> Subject: Re: Does the string "Cla$$War" affect Lucene?
>
> Sounds extremely unlikely.  What is the query?  What analyzer? What
version of
> lucene?  What about other strings containing $$?
>
>
> --
> Ian.
>
>
> On Tue, Aug 14, 2012 at 9:13 AM, zhoucheng2008
> <zhoucheng2008@gmail.com> wrote:
> > Hi,
> >
> >
> > I have a big index, and when I searched it with a title string
"Cla$$War",
> Lucene became very slow. It doesn't happen when I searched with other
title
> string such as "Gone with Wind". Does the "$$" affect the search
performance?
> >
> >
> > Thanks,
> > Cheng
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message