lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <gol...@detego-software.de>
Subject Re: StandardAnalyzer & e-mail addresses
Date Wed, 08 Oct 2003 17:31:07 GMT
I am no a JavaCC-expert either. Maybe it´s a precedence problem.
Could you try

| <EMAIL: <ALPHANUM> (("."|"-"|"_") <ALPHANUM>)+ "@" <ALPHANUM> (("."|"-")
<ALPHANUM>)+ >


Christoph


Erik Hatcher schrieb:
> I'm not JavaCC-savvy enough (yet), but it seems there is a flaw in the 
> StandardTokenizer and its determination of e-mail addresses.
> 
> If I analyze "xyz@example.com", it splits into two tokens: "xyz@example" 
> and "com".  Shouldn't this rule:
> 
>   // email addresses
> | <EMAIL: <ALPHANUM> ("."|"-"|"_" <ALPHANUM>)+ "@" <ALPHANUM> ("."|"-"

> <ALPHANUM>)+ >
> 
> Be clever enough to keep the .com with it?  Perhaps some other parsing 
> is taking precedence?
> 
> Thanks,
>     Erik
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 

-- 
*****************************************************************
* Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
* Detego Software GmbH       Mobile: +49 179 1128469            *
* Keuslinstr. 13             Fax.:   +49 721 151516176          *
* 80798 München, Germany     Email:  goller@detego-software.de  *
*****************************************************************


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message