lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <DCutt...@grandcentral.com>
Subject RE: StrictAnalyzer
Date Wed, 20 Feb 2002 17:30:20 GMT
> From: Dmitry Serebrennikov [mailto:dmitrys@earthlink.net]
> 
> I know at least in my case, I have a much more extensive list of stop 
> words and they are simply read from a file into an array and 
> then passed 
> to the existing class. Would this approach work in your case? 

I think that serious applications will usually need to define an Analyzer
class, or at least parameterize an existing class, rather than just use
something as-is off the shelf.  They might want to analyze different fields
differently, or might want to use a particular stop list, or might care
about how particular acronyms are tokenized and normalized.

So we should not attempt to provide analyzers that make everyone happy: that
effort is destined to fail.  Rather, we should attempt to provide tools to
make it easy to create lots of different, useful, analyzers.

I think the proposed StrictAnalyzer shows that the analyzer toolkit is good:
Alan was able to create the analyzer he needs with just a few lines of code,
mostly assembling existing bits and pieces.  It would be simpler yet if he
was able to extend StandardAnalyzer, providing just a different stop list.

So the action item I see is that StandardAnalyzer should be made non-final.

We should not change the default stop lists in Lucene, since that would
break existing indexes when folks upgrade to a new version of Lucene.  A
library of file-based stop lists is a good idea, though.

Doug

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message