lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4284) RFE: stopword filter without lowercase side-effect
Date Thu, 02 Aug 2012 14:39:02 GMT


Robert Muir commented on LUCENE-4284:

Really all these analyzers are just simple examples and not intended to solve all use cases.

You can just make your own that doesnt lowercase at all with hardly any code, and 
if you want to control case sensitivity of the stopword set, again do this on your stopset
(pass the boolean to StopFilter.makeStopSet etc).

    Analyzer a = new ReusableAnalyzerBase() {
      protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
        Tokenizer source = new LetterTokenizer(matchVersion, reader);
        return new TokenStreamComponents(source, new StopFilter(matchVersion, source, stopwords));

Otherwise we have to implement options to all Analyzers for everyones possible usecases,
which is too many (we will never make everyone happy).

> RFE: stopword filter without lowercase side-effect
> --------------------------------------------------
>                 Key: LUCENE-4284
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Sam Halliday
>            Priority: Minor
> It would appear that accept()-time lowercasing of Tokens is not favourable anymore, due
to the @Deprecation of the only constructor in StopFilter that allows this.
> Please support some way to allow stop-word removal without lowercasing the output:

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message