lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/analysis
Date Wed, 10 Mar 2004 18:08:24 GMT wrote:
>   -  public StopFilter(TokenStream in, Set stopTable) {
>   +  public StopFilter(TokenStream in, Set stopWords) {
>        super(in);
>   -    table = stopTable;
>   +    this.stopWords = new HashSet(stopWords);
>      }

This always allocates a new HashSet, which, if the stop list is large, 
and documents are small, could impact performance.

Perhaps we can replace this with something like:

public StopFilter(TokenStream in, Set stopWords) {
   this(in, stopWords instanceof HashSet ? ((HashSet)stopWords)
            : new HashSet(stopWords));

and then add another constructor:

private StopFilter(TokenStream in, HashSet stopWords) {
   this.stopWords = stopTable;

Also, if we want the implementation to always be a HashSet internally, 
for performance, we ought to declare the field to be a HashSet, no?

The competing goals here are:
   1. Not to expose publicly the implementation of the Set;
   2. Not to copy the contents of the Set when folks pass the value of 
   3. Use the most efficient implementation internally.

I think the changes above meet all of these.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message