lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Updated: (LUCENE-1817) it is impossible to use a custom dictionary for SmartChineseAnalyzer
Date Thu, 27 Aug 2009 16:50:59 GMT


Robert Muir updated LUCENE-1817:

    Attachment: LUCENE-1817.patch

Here is a javadocs-only patch that I think is the best solution.

This is because i created several custom dictionaries and found:
1) it will be difficult to support this dictionary format for a number of reasons
2) the dictionary format is limited to GB2312 encoding, and will not support things like traditional
3) even when creating a correct file in the correct format, there are many assumptions about
what should be in the dictionary.
   Especially things like WordDictionary.expandDelimiterData
   If these assumptions are not met, things like infinite loops occur.

I recommend we instead remove javadocs describing how to use a custom dictionary.
And in this patch also expand the EXPERIMENTAL wording from just APIs, to both APIs and file
In the future we should refactor and use a unicode-based format.

I won't do anything here without some consensus that others feel it is the right way to go,
but I think we should do this in 2.9

> it is impossible to use a custom dictionary for SmartChineseAnalyzer
> --------------------------------------------------------------------
>                 Key: LUCENE-1817
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Priority: Minor
>         Attachments:, LUCENE-1817-mark-cn-experimental.patch, LUCENE-1817.patch,
> it is not possible to use a custom dictionary, even though there is a lot of code and
javadocs to allow this.
> This is because the custom dictionary is only loaded if it cannot load the built-in one
(which is of course, in the jar file and should load)
> {code}
> public synchronized static WordDictionary getInstance() {
>     if (singleInstance == null) {
>       singleInstance = new WordDictionary(); // load from jar file
>       try {
>         singleInstance.load();
>       } catch (IOException e) { // loading from jar file must fail before it checks the
AnalyzerProfile (where this can be configured)
>         String wordDictRoot = AnalyzerProfile.ANALYSIS_DATA_DIR;
>         singleInstance.load(wordDictRoot);
>       } catch (ClassNotFoundException e) {
>         throw new RuntimeException(e);
>       }
>     }
>     return singleInstance;
>   }
> {code}
> I think we should either correct this, document this, or disable custom dictionary support...

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message