lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Potential bug in StandardTokenizerImpl
Date Tue, 27 Nov 2007 07:07:29 GMT

: If you pass "www.abc.com", the output is (www.abc.com,0,11,type=<HOST>)
: (which is correct in my opinion).
: However, if you pass "www.abc.com." (notice the extra '.' at the end), the
: output is (wwwabccom,0,12,type=<ACRONYM>).

see also...
http://www.nabble.com/Inconsistent-StandardTokenizer-behaviour-tf596059.html#a1593383
http://www.nabble.com/Standard-Analyzer---Host-and-Acronym-tf3620533.html#a10109926

one hitch which potentially changing this now is that it would break 
some searches in applications that have existing indexes built using 
previous versions.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message