lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Language Specific Analyzer
Date Sat, 14 Nov 2015 17:10:34 GMT
Hi,

you cannot change the behavior of predefined analyzers! But since Lucene 5 there is no need
to write your own subclass to define a custom analyzer. Just use CustomAnalyzer and define
via fluent builder API how your analysis should look like (see example in javadocs):

https://lucene.apache.org/core/5_3_1/analyzers-common/org/apache/lucene/analysis/custom/CustomAnalyzer.html

Please note: Language specific stemmers will fail to work correctly if the terms still contain
punctuation! It also depends on the stemmer if lowercasing is needed before the stemmer.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: marco turchi [mailto:marco.turchi@gmail.com]
> Sent: Saturday, November 14, 2015 5:39 PM
> To: java-user@lucene.apache.org
> Subject: Language Specific Analyzer
> 
> Dear Users,
> I need to develop my language specific analyzer that:
> 1) does not remove punctuations
> 2) lowercases and stems each term in the text.
> 
> I have tried some of the pre-implemented language analyzer (e.g. German
> and
> Italian analyzers), but they remove punctuation.  I/m not sure, but
> probably what I need is the whitespace analyzer instead of the standard
> analyzer.
> 
> Is there a way to force each language specific analyzer to use the
> whitespace analyzer or in general not to remove punctuations?
> 
> Thanks a lot!
> Marco


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message