lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tare...@controldocs.com
Subject Re: Changing the Punctuation definition for StandardAnalyzer
Date Thu, 20 Dec 2007 19:21:45 GMT
Thanks Karl,

I would rather like to modify the lexer grammar. But exactly where it is
defined. After having a quick look, seems like
StandardTokenizerTokenManager.java may be where it is being done.
Ampersand having a decimal value of '38', I was assuming that the
following step is taken when faced with ampersand:

=============
              case 73:
                  if (curChar == 38)
                     jjstateSet[jjnewStateCnt++] = 74;
                  break;
=============

It's kind of complicated, so before I attempt to delve into I thought I
should ask if I am looking at the right place.

Thanks again!
Tareque



>
> 20 dec 2007 kl. 18.43 skrev tareque@controldocs.com:
>
>> I am using StandardAnalyzer for my indexes. Now I don't want to be
>> able to
>> be search whole email addresses, and want to consider '@' as a
>> punctuation
>> too. Because my users would rather be able to search for user id and/
>> or
>> the host name to return all the email addresses than searching by the
>> whole address. And, that way, then can create a query that will return
>> email addresses anyway.
>>
>> How do I let StandardAnalyzer consider '@' as a punctuation?
>
> A quick and dirty solution is to introduce a TokenFilter that splits
> any token at @ and add it to the end of the chain of streams in
> StandardAnalyzer#tokenStream.
>
> It would probably be much more efficient if you modified the lexer
> grammar StandardTokenzier is generated from.
>
> --
> karl
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message