lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3055) LUCENE-2372, LUCENE-2389 made it impossible to subclass core analyzers
Date Fri, 29 Apr 2011 20:51:03 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027179#comment-13027179
] 

Uwe Schindler commented on LUCENE-3055:
---------------------------------------

{quote}
>From my perspective the most important reason is to avoid a huge performance trap: previously
if you subclassed one of these analyzers, override tokenStream(), and added SpecialFilter
for example, most of the time users would actually slow down indexing, because now reusableTokenStream()
cannot be used by the indexer.
{quote}

Additionally, exactly this special case (overwriting one of the methods) was the biggest problem,
leading to ugly reflection based checks in Lucene 3.0: In 3.0 StandardAnalyzer correctly implemented
both tokenStream() and reuseableTokenStream(). As soon as one subclass only overrided tokenStream(),
but the indexer still calling reuseableTokenStream() the changes were not even used, leading
to lots of bug reports. Because of this, a reflection based backwards hack was done in 3.0
(see o.a.l.util.VirtualMethod class to make this easier), that prevented the indexer from
calling reuseableTokenStream if a subclass suddenly overwrote only one of the methods. With
moving forward in 3.1, these backwards hacks even got heavier (e.g. changes in TokenStreams,
new base class ReuseableAnalyzerBase,...), so the only solution was to enforce the decorator
pattern.

The above example by Robert is the correct way to implement you "factory" of TokenStreams.
Everything else like subclassing StandardAnalyzer is ugly as it hides what you are really
doing. The above pattern does exactly what also Solr's Schemadoes: You have to explicitely
list all your components, making it clear what your TokenStreams are doing.

Trust me, the above example is shorter than subclassing previous StandardAnalyzer completely
(both tokenStream and reuseableTokenStream) and is showing like solrschema.xml what your Analyzer
looks like (no hidden stuff in superfactories,...)

> LUCENE-2372, LUCENE-2389 made it impossible to subclass core analyzers
> ----------------------------------------------------------------------
>
>                 Key: LUCENE-3055
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3055
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 3.1
>            Reporter: Ian Soboroff
>
> LUCENE-2372 and LUCENE-2389 marked all analyzers as final.  This makes ReusableAnalyzerBase
useless, and makes it impossible to subclass e.g. StandardAnalyzer to make a small modification
e.g. to tokenStream().  These issues don't indicate a new method of doing this.  The issues
don't give a reason except for design considerations, which seems a poor reason to make a
backward-incompatible change

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message