lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: ClassicAnalyzer Behavior on accent character
Date Thu, 26 Oct 2017 19:02:33 GMT


Classic is ... "classic" ... it exists largely for historical purposes to 
provide a tokenizer that does exactly what the javadocs say it does 
(regarding punctuation, "produc numbers", and email addresses), so that 
people who depend on that behavior can continue to rely on it.

Standard is ... "standard" ... it implements that Unicode Standard text 
segmentation rules.


: Date: Fri, 20 Oct 2017 18:58:35 +0530
: From: Chitra <chithu.r111@gmail.com>
: Reply-To: java-user@lucene.apache.org
: To: Lucene Users <java-user@lucene.apache.org>
: Subject: Re: ClassicAnalyzer Behavior on accent character
: 
: Hi,
:          I found the difference and understand the behavior of both
: tokenizers appropriately.
: 
: Could you please suggest me which one is the better to use
: ClassicTokenizer/StandardTokenizer?
: 
: -- 
: Regards,
: Chitra
: 

-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message