lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/analysis StopFilter.java
Date Thu, 11 Mar 2004 13:40:45 GMT
HashSet is fine, and the best we've currently got, but let the caller
of StopFilter choose that.  Other, better, Set implementations may be
widely available one day.

Otis

--- Erik Hatcher <erik@ehatchersolutions.com> wrote:
> This, too, is my preferred approach.... it just seemed such a strong 
> argument for HashSet everywhere.   I particularly agree with the lack
> 
> of need to enforce HashSet.
> 
> 	Erik
> 
> On Mar 11, 2004, at 8:29 AM, Otis Gospodnetic wrote:
> 
> > I agree, partially, with Doug about not copying thing and that use
> of
> > instanceof.
> >
> > The part where I don't agree is where I agree with what Scott Ganyo
> > said, and with Erik's initial approach: use interfaces.  I don't
> see a
> > need to epxose that HashSet.  Just use Set.
> >
> > Well, maybe not even an internal HashSet enforcement needs to be
> made.
> > Why not leave it up to the caller to pick the Set implementation
> that
> > it wants to use?  Why enforce it in StopFilter?
> >
> > I'm for:
> >
> > public StopFilter(TokenStream in, Set stopWords) {
> >     super(in);
> >     this.stopWords = stopWords;
> >
> >
> > Otis (didn't follow the discussion closely, sorry if I repeated
> > somebody else's words or if I'm way off) Gospodnetic
> >
> >
> >
> > --- Doug Cutting <cutting@apache.org> wrote:
> >> ehatcher@apache.org wrote:
> >>>   -  public StopFilter(TokenStream in, Set stopTable) {
> >>>   +  public StopFilter(TokenStream in, Set stopWords) {
> >>>        super(in);
> >>>   -    table = stopTable;
> >>>   +    this.stopWords = new HashSet(stopWords);
> >>>      }
> >>
> >> This always allocates a new HashSet, which, if the stop list is
> >> large,
> >> and documents are small, could impact performance.
> >>
> >> Perhaps we can replace this with something like:
> >>
> >> public StopFilter(TokenStream in, Set stopWords) {
> >>    this(in, stopWords instanceof HashSet ? ((HashSet)stopWords)
> >>             : new HashSet(stopWords));
> >> }
> >>
> >> and then add another constructor:
> >>
> >> private StopFilter(TokenStream in, HashSet stopWords) {
> >>    super(in);
> >>    this.stopWords = stopTable;
> >> }
> >>
> >> Also, if we want the implementation to always be a HashSet
> >> internally,
> >> for performance, we ought to declare the field to be a HashSet,
> no?
> >>
> >> The competing goals here are:
> >>    1. Not to expose publicly the implementation of the Set;
> >>    2. Not to copy the contents of the Set when folks pass the
> value
> >> of
> >> makeStopSet.
> >>    3. Use the most efficient implementation internally.
> >>
> >> I think the changes above meet all of these.
> >>
> >> Doug
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message