lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <>
Subject [jira] [Commented] (LUCENE-7444) Remove StopFilter from StandardAnalyzer in Lucene-Core
Date Mon, 12 Sep 2016 01:37:20 GMT


David Smiley commented on LUCENE-7444:

Good to see there seems to be general agreement.

bq. But for everyone else, we must keep a simple core API (StandardAnalyzer ctor) taking an
optional stop words set in core.

+1 sounds good to me.

> Remove StopFilter from StandardAnalyzer in Lucene-Core
> ------------------------------------------------------
>                 Key: LUCENE-7444
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: core/other, modules/analysis
>    Affects Versions: 6.2
>            Reporter: Uwe Schindler
> Yonik said on LUCENE-7318:
> {quote}
> bq. I think it would make a good default for most Lucene users, and we should graduate
it from the analyzers module into core, and make it the default for IndexWriter.
> This "StandardAnalyzer" is specific to English, as it removes English stopwords.
> That seems to be an odd choice now for a few reasons:
> - It was argued in the past (rather vehemently) that Solr should not prefer english in
it's default "text" field
> - AFAIK, removing stopwords is no longer considered best practice.
> Given that removal of english stopwords is the only thing that really makes this analyzer
english-centric (and given the negative impact that can have on other languages), it seems
like the stopword filter should be removed from StandardAnalyzer.
> {quote}
> When trying to fix the backwards incompatibility issues in LUCENE-7318, it looks like
most unrelated code moved from analysis module to core (and changing package names!!!! :(
) was related to word list loading, CharArraySets, and superclasses of StopFilter. If we follow
Yonik's suggestion, we can revert all those changes. I agree with hin, an "universal" analyzer
should not have any language specific stop-words.
> The other thing is LowercaseFilter, but I'd suggest to simply add a clone of it to Lucene
core and leave the analysis-module self-contained.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message