lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/analysis StopFilter.java
Date Thu, 11 Mar 2004 13:29:41 GMT
I agree, partially, with Doug about not copying thing and that use of
instanceof.

The part where I don't agree is where I agree with what Scott Ganyo
said, and with Erik's initial approach: use interfaces.  I don't see a
need to epxose that HashSet.  Just use Set.

Well, maybe not even an internal HashSet enforcement needs to be made. 
Why not leave it up to the caller to pick the Set implementation that
it wants to use?  Why enforce it in StopFilter?

I'm for:

public StopFilter(TokenStream in, Set stopWords) {
    super(in);
    this.stopWords = stopWords;


Otis (didn't follow the discussion closely, sorry if I repeated
somebody else's words or if I'm way off) Gospodnetic



--- Doug Cutting <cutting@apache.org> wrote:
> ehatcher@apache.org wrote:
> >   -  public StopFilter(TokenStream in, Set stopTable) {
> >   +  public StopFilter(TokenStream in, Set stopWords) {
> >        super(in);
> >   -    table = stopTable;
> >   +    this.stopWords = new HashSet(stopWords);
> >      }
> 
> This always allocates a new HashSet, which, if the stop list is
> large, 
> and documents are small, could impact performance.
> 
> Perhaps we can replace this with something like:
> 
> public StopFilter(TokenStream in, Set stopWords) {
>    this(in, stopWords instanceof HashSet ? ((HashSet)stopWords)
>             : new HashSet(stopWords));
> }
> 
> and then add another constructor:
> 
> private StopFilter(TokenStream in, HashSet stopWords) {
>    super(in);
>    this.stopWords = stopTable;
> }
> 
> Also, if we want the implementation to always be a HashSet
> internally, 
> for performance, we ought to declare the field to be a HashSet, no?
> 
> The competing goals here are:
>    1. Not to expose publicly the implementation of the Set;
>    2. Not to copy the contents of the Set when folks pass the value
> of 
> makeStopSet.
>    3. Use the most efficient implementation internally.
> 
> I think the changes above meet all of these.
> 
> Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message