lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yahootintin.11533...@bloglines.com
Subject Re: Inconsistent StandardTokenizer behaviour
Date Tue, 22 Nov 2005 00:44:55 GMT
Sorry for the bad looking table.  Retrying...

 input string - output token
(output type)
1. 1.2 - 1.2 (<HOST>)
2. 1.2. - 1.2 (<HOST>)
3. a.b - a.b
(<HOST>)
4. a.b. - a.b. (<ACRONYM>)
5. www.apache.org - www.apache.org (<HOST>)

6. www.apache.org. - www.apache.org. (<ACRONYM>)

--- java-user@lucene.apache.org
wrote:
This is the results for the StandardTokenizer:
>    input - output
token -
> output type
> 1. 1.2   - 1.2          - <HOST>
> 2. 1.2.  - 1.2
         - <HOST>
> 
> 3. a.b   - a.b          - <HOST>
> 4. a.b.  - a.b.
        - <ACRONYM>
> 5.
> www.apache.org  - www.apache.org  - <HOST>
>
6. www.apache.org. - www.apache.org.
> - <ACRONYM>
> 
> Number 6 should
still be a <HOST> type, shouldn't it?  This
> causes problems for the StandardFilter.
 Why is it saying its an <ACRONYM>?
> 
> ---------------------------------------------------------------------

> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For
additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message