lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rasik Pandey" <rasik.pan...@ajlsm.com>
Subject RE : Analyzers
Date Thu, 17 Jun 2004 10:13:59 GMT
> For the reflection way, I use a configuration file that
> specifies the initial state and then use a no-argument
> constructor.  Since I don't think that is very generalizable, I
> thought maybe you could do a copy() and then a reset() method
> (similar to the JSP tag release() method).  The copy() method
> would create a new memory object. Then reset would put itself
> back into a clean state.  Many of the filters have read-only
> information (such as the Stop filter), so reset wouldn't have
> to do anything.  Other's may require more, such as preserving
> the initialization state (which will take up extra memory).  I
> don't know if this is a generalizable process or if it is worth
> the effort.  The reflection way works well for me b/c I am
> already using a configuration file, so a few more properties
> aren't a big deal.  One of the things that is nice about Lucene
> is it doesn't require a configuration file.
> 
> Not sure if this is enough to go, so let me know.
> 
> -Grant

So I gave this a little thought...

AbstractTokenizer could become 
CloneableTokenizer implements Tokenizer, Cloneable 

AbstractTokenFilter could become 
CloneableTokenFilter implements TokenFilter, Cloneable

in both of which the clone() method would return a new object allowing implementations like
BaseAnalyzer to take advantage of your init() methods and setters (AbstractTokenFilter .setTokenStream
and AbstractTokenizer.setReader) OR allow each CloneableTokenizer, CloneableTokenFilter implementation
to generate its new object using its own constructor based dependency injection.

We could also remove the need for the init(), and setter methods in AbstractTokenizer and
AbstractTokenFilter and create two abstract factory methods CloneableTokenizer.clone(reader)
and CloneableTokenFilter.clone(TokenStream) which would handle TokenStream construction using
the argument and any configured class member objects (stopWords, charsets, etc).

Your thoughts...

Regards,
RBP



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message