lucene-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rm...@apache.org
Subject svn commit: r1242557 - in /lucene/dev/trunk: modules/analysis/kuromoji/src/resources/org/apache/lucene/analysis/kuromoji/stopwords.txt solr/example/solr/conf/lang/stopwords_ja.txt
Date Thu, 09 Feb 2012 22:17:45 GMT
Author: rmuir
Date: Thu Feb  9 22:17:44 2012
New Revision: 1242557

URL: http://svn.apache.org/viewvc?rev=1242557&view=rev
Log:
SOLR-3115: improve japanese stopwords.txt description

Modified:
    lucene/dev/trunk/modules/analysis/kuromoji/src/resources/org/apache/lucene/analysis/kuromoji/stopwords.txt
    lucene/dev/trunk/solr/example/solr/conf/lang/stopwords_ja.txt

Modified: lucene/dev/trunk/modules/analysis/kuromoji/src/resources/org/apache/lucene/analysis/kuromoji/stopwords.txt
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/analysis/kuromoji/src/resources/org/apache/lucene/analysis/kuromoji/stopwords.txt?rev=1242557&r1=1242556&r2=1242557&view=diff
==============================================================================
--- lucene/dev/trunk/modules/analysis/kuromoji/src/resources/org/apache/lucene/analysis/kuromoji/stopwords.txt
(original)
+++ lucene/dev/trunk/modules/analysis/kuromoji/src/resources/org/apache/lucene/analysis/kuromoji/stopwords.txt
Thu Feb  9 22:17:44 2012
@@ -1,14 +1,19 @@
 #
 # This file defines a stopword set for Japanese.
 #
-# The set is made up hand-picked frequent terms from taken from segmented Japanese
-# Wikipedia.  Punctuation characters and frequent kanji have mostly been left out.
+# This set is made up of hand-picked frequent terms from segmented Japanese Wikipedia.
+# Punctuation characters and frequent kanji have mostly been left out.  See LUCENE-3745
+# for frequency lists, etc. that can be useful for making your own set (if desired)
 #
-# There is an overlap between these stopwords and the terms removed when used in
-# combination with the KuromojiPartOfSpeechStopFilter.  When editing this file, note
+# Note that there is an overlap between these stopwords and the terms stopped when used
+# in combination with the KuromojiPartOfSpeechStopFilter.  When editing this file, note
 # that comments are not allowed on the same line as stopwords.
 #
-# See LUCENE-3745 for frequency lists, etc. that can be useful for making your own set.
+# Also note that stopping is done in a case-insensitive manner.  Change your StopFilter
+# configuration if you need case-sensitive stopping.  Lastly, note that stopping is done
+# using the same character width as the entries in this file.  Since this StopFilter is
+# normally done after a CJKWidthFilter in your chain, you would usually want your romaji
+# entries to be in half-width and your kana entries to be in full-width.
 #
 の
 に

Modified: lucene/dev/trunk/solr/example/solr/conf/lang/stopwords_ja.txt
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/lang/stopwords_ja.txt?rev=1242557&r1=1242556&r2=1242557&view=diff
==============================================================================
--- lucene/dev/trunk/solr/example/solr/conf/lang/stopwords_ja.txt (original)
+++ lucene/dev/trunk/solr/example/solr/conf/lang/stopwords_ja.txt Thu Feb  9 22:17:44 2012
@@ -1,14 +1,19 @@
 #
 # This file defines a stopword set for Japanese.
 #
-# The set is made up hand-picked frequent terms from taken from segmented Japanese
-# Wikipedia.  Punctuation characters and frequent kanji have mostly been left out.
+# This set is made up of hand-picked frequent terms from segmented Japanese Wikipedia.
+# Punctuation characters and frequent kanji have mostly been left out.  See LUCENE-3745
+# for frequency lists, etc. that can be useful for making your own set (if desired)
 #
-# There is an overlap between these stopwords and the terms removed when used in
-# combination with the KuromojiPartOfSpeechStopFilter.  When editing this file, note
+# Note that there is an overlap between these stopwords and the terms stopped when used
+# in combination with the KuromojiPartOfSpeechStopFilter.  When editing this file, note
 # that comments are not allowed on the same line as stopwords.
 #
-# See LUCENE-3745 for frequency lists, etc. that can be useful for making your own set.
+# Also note that stopping is done in a case-insensitive manner.  Change your StopFilter
+# configuration if you need case-sensitive stopping.  Lastly, note that stopping is done
+# using the same character width as the entries in this file.  Since this StopFilter is
+# normally done after a CJKWidthFilter in your chain, you would usually want your romaji
+# entries to be in half-width and your kana entries to be in full-width.
 #
 の
 に



Mime
View raw message