Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0132A200B7F for ; Sun, 11 Sep 2016 11:36:24 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id F3DEF160AC7; Sun, 11 Sep 2016 09:36:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 473A2160AAA for ; Sun, 11 Sep 2016 11:36:23 +0200 (CEST) Received: (qmail 26176 invoked by uid 500); 11 Sep 2016 09:36:21 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 26166 invoked by uid 99); 11 Sep 2016 09:36:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Sep 2016 09:36:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id C2EDB2C014F for ; Sun, 11 Sep 2016 09:36:21 +0000 (UTC) Date: Sun, 11 Sep 2016 09:36:21 +0000 (UTC) From: "Uwe Schindler (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (LUCENE-7444) Remove StopFilter from StandardAnalyzer in Lucene-Core MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 11 Sep 2016 09:36:24 -0000 [ https://issues.apache.org/jira/browse/LUCENE-7444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-7444: ---------------------------------- Description: Yonik said on LUCENE-7318: {quote} bq. I think it would make a good default for most Lucene users, and we should graduate it from the analyzers module into core, and make it the default for IndexWriter. This "StandardAnalyzer" is specific to English, as it removes English stopwords. That seems to be an odd choice now for a few reasons: - It was argued in the past (rather vehemently) that Solr should not prefer english in it's default "text" field - AFAIK, removing stopwords is no longer considered best practice. Given that removal of english stopwords is the only thing that really makes this analyzer english-centric (and given the negative impact that can have on other languages), it seems like the stopword filter should be removed from StandardAnalyzer. {quote} When trying to fix the backwards incompatibility issues in LUCENE-7318, it looks like most unrelated code moved from analysis module to core (and changing package names!!!! :( ) was related to word list loading, CharArraySets, and superclasses of StopFilter. If we follow Yonik's suggestion, we can revert all those changes. I agree with hin, an "universal" analyzer should not have any language specific stop-words. The other thing is LowercaseFilter, but I'd suggest to simply add a clone of it to Lucene core and leave the analysis-module self-contained. was: Yonik said on LUCENE-7318: {quote} bq. I think it would make a good default for most Lucene users, and we should graduate it from the analyzers module into core, and make it the default for IndexWriter. This "StandardAnalyzer" is specific to English, as it removes English stopwords. That seems to be an odd choice now for a few reasons: - It was argued in the past (rather vehemently) that Solr should not prefer english in it's default "text" field - AFAIK, removing stopwords is no longer considered best practice. Given that removal of english stopwords is the only thing that really makes this analyzer english-centric (and given the negative impact that can have on other languages), it seems like the stopword filter should be removed from StandardAnalyzer. {quote} When trying to fix the backwards incompatibility issues in LUCENE-7318, it looks like most unrelated code moved from analysis module to core (and changing package names!!!! :( ) was related to word list loading and superclasses of StopFilter. If we follow Yonik's suggestion, we can revert all those changes. I agree with hin, an "universal" analyzer should not have any language specific stop-words. The other thing is LowercaseFilter, but I'd suggest to simply add a clone of it to Lucene core and leave the analysis-module self-contained. > Remove StopFilter from StandardAnalyzer in Lucene-Core > ------------------------------------------------------ > > Key: LUCENE-7444 > URL: https://issues.apache.org/jira/browse/LUCENE-7444 > Project: Lucene - Core > Issue Type: Task > Components: core/other, modules/analysis > Affects Versions: 6.2 > Reporter: Uwe Schindler > > Yonik said on LUCENE-7318: > {quote} > bq. I think it would make a good default for most Lucene users, and we should graduate it from the analyzers module into core, and make it the default for IndexWriter. > This "StandardAnalyzer" is specific to English, as it removes English stopwords. > That seems to be an odd choice now for a few reasons: > - It was argued in the past (rather vehemently) that Solr should not prefer english in it's default "text" field > - AFAIK, removing stopwords is no longer considered best practice. > Given that removal of english stopwords is the only thing that really makes this analyzer english-centric (and given the negative impact that can have on other languages), it seems like the stopword filter should be removed from StandardAnalyzer. > {quote} > When trying to fix the backwards incompatibility issues in LUCENE-7318, it looks like most unrelated code moved from analysis module to core (and changing package names!!!! :( ) was related to word list loading, CharArraySets, and superclasses of StopFilter. If we follow Yonik's suggestion, we can revert all those changes. I agree with hin, an "universal" analyzer should not have any language specific stop-words. > The other thing is LowercaseFilter, but I'd suggest to simply add a clone of it to Lucene core and leave the analysis-module self-contained. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org