lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anna Hunecke <annahune...@yahoo.de>
Subject Strange behaviour of StandardTokenizer
Date Thu, 17 Jun 2010 13:31:34 GMT
Hi!

I ran into a strange behaviour of the StandardTokenizer. Terms containing a '-' are tokenized
differently depending on the context. 
For example, the term 'nl-lt' is split into 'nl' and 'lt'.
The term 'nl-lt0' is tokenized into 'nl-lt0'.
Is this a bug or a feature? Can I avoid it somehow?
I'm using Lucene 3.0.0.

Best,
Anna



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message