lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yahootintin.11533...@bloglines.com
Subject Re: Inconsistent StandardTokenizer behaviour
Date Tue, 22 Nov 2005 18:43:47 GMT
Cool, I'll take a look at fixing this.

--- java-user@lucene.apache.org
wrote:

> On 21 Nov 2005, at 19:39, yahootintin.11533894@bloglines.com wrote:

> > This is the results for the StandardTokenizer:
> >    input - output
token -
> > output type
> > 1. 1.2   - 1.2          - <HOST>
> > 2. 1.2.
 - 1.2          - <HOST>
> >
> > 3. a.b   - a.b          - <HOST>
> > 4.
a.b.  - a.b.         - <ACRONYM>
> > 5.
> > www.apache.org  - www.apache.org
 - <HOST>
> > 6. www.apache.org. - www.apache.org.
> > - <ACRONYM>
> >

> > Number 6 should still be a <HOST> type, shouldn't it?  This
> > causes
problems for the StandardFilter.  Why is it saying its an  
> > <ACRONYM>?

> 
> Because it's grammar is imperfect?!
> 
> The trailing '.' is throwing
it off from what you expect.  We'd  
> certainly welcome fixes to StandardTokenizer.jj
in this regard.
> 
> 	Erik
> 
> 
> ---------------------------------------------------------------------

> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For
additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message