lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Inconsistent StandardTokenizer behaviour
Date Tue, 22 Nov 2005 01:56:16 GMT

On 21 Nov 2005, at 19:39, yahootintin.11533894@bloglines.com wrote:
> This is the results for the StandardTokenizer:
>    input - output token -
> output type
> 1. 1.2   - 1.2          - <HOST>
> 2. 1.2.  - 1.2          - <HOST>
>
> 3. a.b   - a.b          - <HOST>
> 4. a.b.  - a.b.         - <ACRONYM>
> 5.
> www.apache.org  - www.apache.org  - <HOST>
> 6. www.apache.org. - www.apache.org.
> - <ACRONYM>
>
> Number 6 should still be a <HOST> type, shouldn't it?  This
> causes problems for the StandardFilter.  Why is it saying its an  
> <ACRONYM>?

Because it's grammar is imperfect?!

The trailing '.' is throwing it off from what you expect.  We'd  
certainly welcome fixes to StandardTokenizer.jj in this regard.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message