lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: Standard Analyzer Escapes
Date Fri, 13 Jul 2007 15:46:57 GMT
I just tried some things fast via the Solr admin interface, and
everything seems fine.
I think you are probably confusing what the parser does vs what the
analyzer does.
Try your tests with an un-tokenized field to remove that effect.

-Yonik

On 7/13/07, Walt Stoneburner <walt.stoneburner@gmail.com> wrote:
> In reading the documentation for escape characters, I'm having a
> little trouble understanding what it wants me to do for certain
> special cases.
>
> http://lucene.apache.org/java/docs/queryparsersyntax.html#Escaping%20Special%20Characters
> says: "Lucene supports escaping special characters that are part of
> the query syntax. The current list special characters are:   + - && ||
> ! ( ) { } [ ] ^ " ~ * ? : \     To escape these character use the \
> before the character."
>
> Specifically, I'm curious about the double characters && and || and
> how they should be properly escaped.
>
> Experimentation showed some very strange things with the StandardAnalyzer.
>
> Using Luke, I get some interesting mappings.
>   AT&T    becomes  at&t    (as expected)
>   AT&&T  becomes  t   (tricky... at is now taken as a stop word; fine
> makes sense)
>
> ..but what about...   "AT&&T"   ...nope, still t.
>
> AAA&BBB becomes aaa&bbb    ...correct
> AAA&&BBB becomes   aaa bbb   ...ampersand becomes a space?
> "AAA&&BBB" is also    aaa bbb
>
> AAA\&BBB correctly is   aaa&bbb   ...just as before
> AAA\&&BBB   is  aaa bbb   ...but perhaps we got the escape wrong.
>
> Is '&&' special "character" and is it escaped as \&& or escaped as
> \&\& ...let's find out.
>
> AAA\&\&BBB   is also  aaa bbb   ...perhaps we need quotes?
> "AAA\&\&BBB"   is also  aaa bbb   ...I can't seem to get the escape to work.
>
> How about this?
> AAA&BBB&CCC    strangely becomes   aaa&bbb ccc
>
> Even when escaped?
> AAA\&BBB\&CCC  is also    aaa&bbb ccc    ...appears so.
>
> What about...
> AAA&BBB&CCC&DDD   becomes   aaa&bbb ccc&ddd  ....whoa, not expecting
that.
>
> AAA&&BBB&&CCC&&DDD  becomes   aaa bbb ccc ddd  ...if &&
means AND, ok...
>
> AAA\&&BBB\&&CCC\&&DDD   no change  aaa bbb ccc ddd
>
> AAA\&\&BBB\&\&CCC\&\&DDD  also no change  aaa bbb ccc ddd
>
>
> It appears I literally cannot search for the token with two ampersands
> in it, whether they are touching or not.
>
> Clearly I'm missing something.  Is there a way to get any literal
> sequence of my choosing, using escapes, as a term in the Lucene
> expression?
>
> -Walt Stoneburner
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message