lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Changing the Punctuation definition for StandardAnalyzer
Date Thu, 20 Dec 2007 19:21:45 GMT
Thanks Karl,

I would rather like to modify the lexer grammar. But exactly where it is
defined. After having a quick look, seems like may be where it is being done.
Ampersand having a decimal value of '38', I was assuming that the
following step is taken when faced with ampersand:

              case 73:
                  if (curChar == 38)
                     jjstateSet[jjnewStateCnt++] = 74;

It's kind of complicated, so before I attempt to delve into I thought I
should ask if I am looking at the right place.

Thanks again!

> 20 dec 2007 kl. 18.43 skrev
>> I am using StandardAnalyzer for my indexes. Now I don't want to be
>> able to
>> be search whole email addresses, and want to consider '@' as a
>> punctuation
>> too. Because my users would rather be able to search for user id and/
>> or
>> the host name to return all the email addresses than searching by the
>> whole address. And, that way, then can create a query that will return
>> email addresses anyway.
>> How do I let StandardAnalyzer consider '@' as a punctuation?
> A quick and dirty solution is to introduce a TokenFilter that splits
> any token at @ and add it to the end of the chain of streams in
> StandardAnalyzer#tokenStream.
> It would probably be much more efficient if you modified the lexer
> grammar StandardTokenzier is generated from.
> --
> karl
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message