lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Bigham <>
Subject analyzers-common VS analyzers-icu
Date Wed, 01 Jun 2016 16:56:59 GMT

I recently setup my code to choose the appropriate analyzer from 
analyzers-common depending on the language of the user's index/field.   
I then extended the existing source code to allow, for any language, to 
turn on/off things like stemming, case sensitivity, etc.

Today I discovered analyzers-icu, and I don't understand how to 
understand analyzers-common VS analyzers-icu.

Are they drop in replacements of each other?  Are there features in one 
that aren't available in the other?  What are the pros and cons of using 
one or the other?

In a nutshell, the features I care about are:

- The ability to specify a language and have tokenization performed 
according to that language
- Obviously the more languages supported the better
- The ability to turn on/off stemming for any language (implemented 
myself for analyzers-common)
- The ability to turn on/off case sensitivity for any language 
(implemented myself for analyzers-common)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message