lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: $ or £ symbols are excluded from Search Query
Date Thu, 30 Jul 2009 20:38:09 GMT
WhitespaceAnalyzer won't fold case. It won't strip any "odd" characters out.
It won't, in fact, do anything except break on white space. You might want
to write your own analyzer that incorporates, some of the filters,
especially LowercaseFilter.

On Wed, Jul 29, 2009 at 9:04 AM, cbowditch <bowditch_chris@hotmail.com>wrote:

>
>
>
> Ahmet Arslan wrote:
> >
> >
> >> Can anyone tell me how I can search my index for $ or £.
> >
> > $ or £ or euro character are not reserved characters that are specified
> in
> > QueryParser. I just verified it using the code below: (in Lucene 2.4.1)
> >
> > org.apache.lucene.queryParser.QueryParser qp = new
> > org.apache.lucene.queryParser.QueryParser("title", new
> > WhitespaceAnalyzer());
> > Query q = qp.parse("$ahmet$ AND £arslan£ te$s£t");
> > System.out.println(q.toString());
> >
> > Where the output is : +title:$ahmet$ +title:£arslan£ title:te$s£t
> >
> > Probably your analyzer is eating up those characters. Are you using
> > StandardAnalyzer or SimpleAnalyzer? LetterTokenizer and StandardTokenizer
> > breaks/splits words at those characters. If thats the cause of the
> > problem, use something like WhitespaceAnalyzer or construct your queries
> > programmatically using Lucene Query API. e.g. TermQuery etc.
> >
>
> Thanks for the suggestions. I had tried SimpleAnalyzer and StandardAnalyzer
> within Luke. When I switched to WhitespaceAnalyzer the $ and £ symbols were
> maintained.
>
> Within my own Application we seem to be using a custom Analyzer that sub
> classes Analyzer. What is the implication of switch the base class to
> WhitespaceAnalyzer?
>
> Thanks,
>
> Chris
> --
> View this message in context:
> http://www.nabble.com/%24-or-%C2%A3-symbols-are-excluded-from-Search-Query-tp24716042p24718799.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message