lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project
Date Tue, 21 Jul 2009 08:54:14 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733538#action_12733538
] 

Simon Willnauer commented on LUCENE-1728:
-----------------------------------------

Robert, I have looked at this patch and more important at the source itself and I get more
and more the impression that we have to do more work on this analyzer and the related classes
as just moving them into one package and make everything package private. From my understanding
the Hidden Markov Model Segmenter is a feature which could be replaced by some other algorithm.
Once you have such a feature relationship I would prefer packages by feature which enables
you to remove a single feature just by removing a whole package. 
In other words I would love to see a general refactoring of the code which exploits a tiny
but common API in the base package and is subsequently used by the HHMM "feature". There is
quite a bit of work to do that I do not consider 2.9 work. 
So here is the question, do we keep the structure as it is and just move it to a new subdir
to build a sep. jar or do we move them into one single package (as you did in the patch) and
build up a clean HHMM package  later in 3.*. 

Beside the packaging I found heaps of things I do not like very much in the code (not your
patch :) an my fingertips getting nervous when I see stuff like the AbstractDictionary hierarchy
or those Singletions. I would really like to have this separation of CN and common Analyzers
in for 2.9 -- we just need to decide which way we go. I guess moving it over without changing
code would be easiest.

simon


> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer jar to grow
up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained
in that jar. 
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g.
using lucene on a mobile phone) to include analyzer.jar without getting into trouble with
disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring
as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several
classes should be package protected, members and classes could be final, commented syserr
and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it
to 3.0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message