lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <gol...@detego-software.de>
Subject Re: StandardAnalyzer & e-mail addresses
Date Thu, 09 Oct 2003 09:11:47 GMT
Erik,

seems that we are both not very good at regular expressions :-)

This one finally works:


| <EMAIL: <ALPHANUM> (("."|"-"|"_") <ALPHANUM>)* "@" <ALPHANUM> (("."|"-")
<ALPHANUM>)+ >


Christoph

Erik Hatcher schrieb:
> Christoph,
> 
> Thanks for that, but unfortunately it didn't change things.  I guess its 
> time to push JavaCC learning into my to-learn queue :)
> 
>     Erik
> 
> 
> On Wednesday, October 8, 2003, at 01:31  PM, Christoph Goller wrote:
> 
>> I am no a JavaCC-expert either. Maybe it´s a precedence problem.
>> Could you try
>>
>> | <EMAIL: <ALPHANUM> (("."|"-"|"_") <ALPHANUM>)+ "@" <ALPHANUM>

>> (("."|"-") <ALPHANUM>)+ >
>>
>>
>> Christoph
>>
>>
>> Erik Hatcher schrieb:
>>
>>> I'm not JavaCC-savvy enough (yet), but it seems there is a flaw in 
>>> the StandardTokenizer and its determination of e-mail addresses.
>>> If I analyze "xyz@example.com", it splits into two tokens: 
>>> "xyz@example" and "com".  Shouldn't this rule:
>>>   // email addresses
>>> | <EMAIL: <ALPHANUM> ("."|"-"|"_" <ALPHANUM>)+ "@" <ALPHANUM>

>>> ("."|"-" <ALPHANUM>)+ >
>>> Be clever enough to keep the .com with it?  Perhaps some other 
>>> parsing is taking precedence?
>>> Thanks,
>>>     Erik
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
>>
>> -- 
>> *****************************************************************
>> * Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
>> * Detego Software GmbH       Mobile: +49 179 1128469            *
>> * Keuslinstr. 13             Fax.:   +49 721 151516176          *
>> * 80798 München, Germany     Email:  goller@detego-software.de  *
>> *****************************************************************
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 

-- 
*****************************************************************
* Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
* Detego Software GmbH       Mobile: +49 179 1128469            *
* Keuslinstr. 13             Fax.:   +49 721 151516176          *
* 80798 München, Germany     Email:  goller@detego-software.de  *
*****************************************************************


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message