lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joshua O'Madadhain <jmad...@ics.uci.edu>
Subject Re: StrictAnalyzer Proposal
Date Wed, 20 Feb 2002 17:47:21 GMT
On Wed, 20 Feb 2002, Otis Gospodnetic wrote:

> > (1)I rewrote StandardAnalyzer as StrictAnalyzer for the project I am
> > working
> > on.  StandardAnalyzer does not filter enough words for my liking.
> > Basically all I did was add to the STOP_WORDS array.  The stop words
> > I added
> > are based on the default values in SQL Server 2000's text indexing. 
> > (Source code below)
> 
> The change seems simple and looks fine to me.  If nobody complains
> until tonight I'll commit it.

As Dmitry said, it seems to me that adding classes to a project which
differ from one another only in static data is poor software engineering
practice, and probably confusing to users.  Since StopAnalyzer has a
constructor which allows users to specify their own arrays of stop words,
I'm not sure what the benefit of StrictAnalyzer is.

On the other hand, I do think that providing a repository of alternative
prefabricated stop word arrays would be useful to users.  I suggest the
following:

(1) Create an area on the Lucene website to a repository of such
things.  (Does Lucene have a 'contributions' ftp site?)
(2) Leave StopAnalyzer as is, to avoid confusion by people upgrading to
the new version, but include a link in the documentation to the
aforementioned repository.

Joshua
 
 jmadden@ics.uci.edu...Obscurium Per Obscurius...www.ics.uci.edu/~jmadden
    Joshua Madden: Information Scientist, Musician, Philosopher-At-Tall
 It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.






--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message