lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2034) Massive Code Duplication in Contrib Analyzers - unifly the analyzer ctors
Date Thu, 17 Dec 2009 00:44:18 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791701#action_12791701
] 

Simon Willnauer commented on LUCENE-2034:
-----------------------------------------

robert, should we hold on one more time and move StopawareAnalyzer into core? As you suggested,
StopwordAnalyzerBase would be a better name for it and way more consistent. That way we could
implement StopAnalyzer with it too.


> Massive Code Duplication in Contrib Analyzers - unifly the analyzer ctors
> -------------------------------------------------------------------------
>
>                 Key: LUCENE-2034
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2034
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>    Affects Versions: 2.9
>            Reporter: Simon Willnauer
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2034,patch, LUCENE-2034,patch, LUCENE-2034.patch, LUCENE-2034.patch,
LUCENE-2034.patch, LUCENE-2034.patch, LUCENE-2034.patch, LUCENE-2034.patch, LUCENE-2034.patch,
LUCENE-2034.patch, LUCENE-2034.txt
>
>
> Due to the variouse tokenStream APIs we had in lucene analyzer subclasses need to implement
at least one of the methodes returning a tokenStream. When you look at the code it appears
to be almost identical if both are implemented in the same analyzer.  Each analyzer defnes
the same inner class (SavedStreams) which is unnecessary.
> In contrib almost every analyzer uses stopwords and each of them creates his own way
of loading them or defines a large number of ctors to load stopwords from a file, set, arrays
etc.. those ctors should be removed / deprecated and eventually removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message