lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <>
Subject [jira] Commented: (SOLR-1336) Add support for lucene's SmartChineseAnalyzer
Date Sat, 08 Aug 2009 21:58:14 GMT


Yonik Seeley commented on SOLR-1336:

Thanks Robert!
Are the stopwords (words="org/apache/lucene/analysis/cn/stopwords.txt") being loaded directly
from the jar?  If so, a comment to that effect might prevent some confusion.

Do you happen to know what the memory footprint of this analyzer is if it's used?  I assume
the dictionaries will get loaded on the first use.

Might be cool to add a chinese field to example/exampledocs/solr.xml... or maybe there should
be an international.xml doc where we could add a few different languages?

> Add support for lucene's SmartChineseAnalyzer
> ---------------------------------------------
>                 Key: SOLR-1336
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>          Components: Analysis
>            Reporter: Robert Muir
>         Attachments: SOLR-1336.patch
> SmartChineseAnalyzer was contributed to lucene, it indexes simplified chinese text as
> if the factories for the tokenizer and word token filter are added to solr it can be
used, although there should be a sample config or wiki entry showing how to apply the built-in
stopwords list.
> this is because it doesn't contain actual stopwords, but must be used to prevent indexing
> note: we did some refactoring/cleanup on this analyzer recently, so it would be much
easier to do this after the next lucene update.
> it has also been moved out of -analyzers.jar due to size, and now builds in its own smartcn
jar file, so that would need to be added if this feature is desired.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message