lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Special characters
Date Thu, 10 Aug 2006 17:33:30 GMT
See below...

On 8/10/06, Pillinger, Adrian <apill@dolby.co.uk> wrote:
>
> I am indexing some text in a java object that is "%772B" with the
> standard analyser and Lucene 2.
>
> Should I be able to search for this with the same text as the query, or
> do I need to do any escaping of characters?


probably not because I doubt that you'll have the '%' in the index (but I
admit I don't know for sure). Get Luke and check to be sure (
http://www.getopt.org/luke/). That will tell you exactly what is in the
index. I suspect you'll find "772B" but the '%' will simply be absent.

Also, watch capitalization. The StandardAnalyzer lowercases your stream as I
remember....

You probably want a different analyzer fot *both* indexing and searching if
you really need to search such strings, try WhitespaceAnalyzer and perhaps
store your values UN_TOKENIZED (but watch that latter, this assumes you're
controlling your tokens yourself and not relying on the analyzer to break up
your input stream).

And if you want to look treat different fields differently, think about
PerFieldAnalyzerWrapper.

Best
Erick


Thanks
>
> Adrian
>
> -----------------------------------------
> This message (including any attachments) may contain confidential
> information intended for a specific individual and purpose.  If you
> are not the intended recipient, delete this message.  If you are
> not the intended recipient, disclosing, copying, distributing, or
> taking any action based on this message is strictly prohibited.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message