lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kumar Raja (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1336) Add support for lucene's SmartChineseAnalyzer
Date Wed, 02 Sep 2009 08:58:32 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750340#action_12750340
] 

Kumar Raja commented on SOLR-1336:
----------------------------------

Hi Robert,
Sorry...my bad. There was a mix up of the Solr versions on my machine which caused this error.

This tool is great. It works wonderful and there is a test case pass rate is amazing!!!! Is
there a similar tool for other asian languages, say Japanese and Korean? Can this be customized
to accomodate those languages?

Is there any wiki link or document to help us understand how this tool works? Sort of behind
the scenes.... What exactly does the dictionary contain? Is it any ordinary chinese dictionary
or some sort of a customized/lemmatized dictionary? Also, how can one add new words to the
dictionary?

Thanks,
Kumar

> Add support for lucene's SmartChineseAnalyzer
> ---------------------------------------------
>
>                 Key: SOLR-1336
>                 URL: https://issues.apache.org/jira/browse/SOLR-1336
>             Project: Solr
>          Issue Type: New Feature
>          Components: Analysis
>            Reporter: Robert Muir
>         Attachments: SOLR-1336.patch, SOLR-1336.patch
>
>
> SmartChineseAnalyzer was contributed to lucene, it indexes simplified chinese text as
words.
> if the factories for the tokenizer and word token filter are added to solr it can be
used, although there should be a sample config or wiki entry showing how to apply the built-in
stopwords list.
> this is because it doesn't contain actual stopwords, but must be used to prevent indexing
punctuation... 
> note: we did some refactoring/cleanup on this analyzer recently, so it would be much
easier to do this after the next lucene update.
> it has also been moved out of -analyzers.jar due to size, and now builds in its own smartcn
jar file, so that would need to be added if this feature is desired.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message