lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: cvs commit: jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowball package.html
Date Sat, 21 Dec 2002 15:01:16 GMT
Otis Gospodnetic wrote:
> I wonder about SnowballAnalyzer and SnowballFilter
> classes.
> The ctor of the later uses introspection to instantiate the appropriate
> Stemmer.
> In most use cases that will be the same Stemmer from call to call. 
> Seems like redundant work and objects created.
> Wouldn't it be better to have SnowballFilter 'cache' instances of
> previously instantiated Stemmers?
> I guess that would require that Snowball's Stemmers are thread
> safe....are they?

Compared to all of the tokens & strings that will be allocated when it 
is used, the allocation of the stemmer should not be significant.  And 
the stemmers are not thread safe anyway.

I don't particularly like the use of introspection either.  I copied it 
from Snowball's sample code.   Unfortunately there's no other way to do 
this without modifying the Snowball code, which I'd rather not do. 
Currrently this project incorporates the Snowball code as-is, so that 
if/when the Snowball project updates things it should be very easy to 
integrate those updates.

This project is still a work in progress.  I want to do some 
benchmarking, more testing and add better documentation before I make a 
release and announce its availability.  If the benchmarking shows major 
performance problems, then I may have to look at optimizing the Snowball 
code, but I hope to avoid that.


To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message