lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/analysis StopFilter.java
Date Thu, 11 Mar 2004 02:18:22 GMT
On Mar 10, 2004, at 1:08 PM, Doug Cutting wrote:
> ehatcher@apache.org wrote:
>>   -  public StopFilter(TokenStream in, Set stopTable) {
>>   +  public StopFilter(TokenStream in, Set stopWords) {
>>        super(in);
>>   -    table = stopTable;
>>   +    this.stopWords = new HashSet(stopWords);
>>      }
>
> This always allocates a new HashSet, which, if the stop list is large, 
> and documents are small, could impact performance.

Ok, after some more thinking on this, part of the dilemma is also that 
analyzers generally construct all of the tokenizers/tokenfilters in the 
tokenStream method.  It would seem better for them to keep instance 
variables for all the non-variant pieces.

With the change to HashSet, any custom analyzers (once the dust settles 
on this change, I'll convert the built-in code to use the new methods) 
will be using the Hashtable ctor thinking it is the most efficient one 
and now it is not.  Is this a problem?

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message