lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/analysis StopFilter.java
Date Wed, 10 Mar 2004 18:08:24 GMT
ehatcher@apache.org wrote:
>   -  public StopFilter(TokenStream in, Set stopTable) {
>   +  public StopFilter(TokenStream in, Set stopWords) {
>        super(in);
>   -    table = stopTable;
>   +    this.stopWords = new HashSet(stopWords);
>      }

This always allocates a new HashSet, which, if the stop list is large, 
and documents are small, could impact performance.

Perhaps we can replace this with something like:

public StopFilter(TokenStream in, Set stopWords) {
   this(in, stopWords instanceof HashSet ? ((HashSet)stopWords)
            : new HashSet(stopWords));
}

and then add another constructor:

private StopFilter(TokenStream in, HashSet stopWords) {
   super(in);
   this.stopWords = stopTable;
}

Also, if we want the implementation to always be a HashSet internally, 
for performance, we ought to declare the field to be a HashSet, no?

The competing goals here are:
   1. Not to expose publicly the implementation of the Set;
   2. Not to copy the contents of the Set when folks pass the value of 
makeStopSet.
   3. Use the most efficient implementation internally.

I think the changes above meet all of these.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message