lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Issues with Special Characters
Date Tue, 16 Sep 2008 14:15:14 GMT
I question whether your example actually searches what you
think. What I suggest is that you get a copy of Luke and
look at what your queries actually produce. That'll give you
a much better idea of what happens under the covers.

Also, query.toString() is your friend. Try printing out your
queries (again, I recommend a small test program) to get
a better feel for what each analyzer does to your queries.

Of course you have to escape the ':' character when it does
not refer to a field. How else could you imagine that the
query parser can distinguish between a field designation
and a field value? Consider the query:

f:stuff g:morestuff

The parser has to understand that the ':' separates the
field 'f' from the value 'stuff' and that 'g' is another field. So
if you want to parse
f:stuff g:somemore:stuff
the parser gets all confused by whether the ':' between
somemore and stuff is a field delimiter or a value, unless
it's escaped.

Best
Erick

On Tue, Sep 16, 2008 at 9:49 AM, miztaken <justjunktome@gmail.com> wrote:

>
> Hi there,
> I will check that out but what do you suggest for searching??
> without escaping works for query string "fw: fyi.dat" but i have to escape
> :
> char for query string "fw:" so i am having two cases?
>
> Please help me
>
>
>
>
> Erick Erickson wrote:
> >
> > You can easily answer the questions about what WhitespaceTokenizer
> > produces by getting a copy of Luke and looking at your index. Or writing
> > a really simple test program that prints out tokens.
> >
> > At the bottom of this page is a list of special characters for escaping:
> > http://lucene.apache.org/java/docs/queryparsersyntax.html
> >
> > Best
> > Erick
> >
> > On Tue, Sep 16, 2008 at 9:05 AM, miztaken <justjunktome@gmail.com>
> wrote:
> >
> >>
> >> Hi there,
> >> I am using WhiteSpaceAnalyser to index documents. I have used this
> >> because
> >> i
> >> need to split tokens based on space only. Also Tokensized=true
> >> While indexing what does it do with special characters like + - && ||
!
> (
> >> )
> >> { } [ ] ^ " ~ * ? : \, will these characters be indexed or will be
> >> chopped
> >> off? I am confused about this.
> >>
> >> Now i am having problem while searching as well..
> >> for query strings like "jason dartling (e-mail)" and "re: fyi.dat", i
> >> don't
> >> have to escape the special characters ( , ) and : but for input such as
> >> "re:" queryParser is producing error so i have escaped characters here.
> >> So it seems like i have two cases to deal with..
> >> Can anyone suggest me one generic way to deal with both the cases?
> >>
> >> Basically how to index and search string with escape characters will be
> >> my
> >> generalized question?
> >>
> >>
> >> Please help me
> >> miztaken
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Issues-with-Special-Characters-tp19511428p19511428.html
> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Issues-with-Special-Characters-tp19511428p19512277.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message