Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 58311 invoked from network); 11 Mar 2004 13:29:49 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 11 Mar 2004 13:29:49 -0000 Received: (qmail 70453 invoked by uid 500); 11 Mar 2004 13:29:45 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 70223 invoked by uid 500); 11 Mar 2004 13:29:43 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 70208 invoked from network); 11 Mar 2004 13:29:43 -0000 Received: from unknown (HELO web12701.mail.yahoo.com) (216.136.173.238) by daedalus.apache.org with SMTP; 11 Mar 2004 13:29:43 -0000 Message-ID: <20040311132941.854.qmail@web12701.mail.yahoo.com> Received: from [194.152.216.94] by web12701.mail.yahoo.com via HTTP; Thu, 11 Mar 2004 05:29:41 PST Date: Thu, 11 Mar 2004 05:29:41 -0800 (PST) From: Otis Gospodnetic Subject: Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/analysis StopFilter.java To: Lucene Developers List In-Reply-To: <404F5998.1040900@apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N I agree, partially, with Doug about not copying thing and that use of instanceof. The part where I don't agree is where I agree with what Scott Ganyo said, and with Erik's initial approach: use interfaces. I don't see a need to epxose that HashSet. Just use Set. Well, maybe not even an internal HashSet enforcement needs to be made. Why not leave it up to the caller to pick the Set implementation that it wants to use? Why enforce it in StopFilter? I'm for: public StopFilter(TokenStream in, Set stopWords) { super(in); this.stopWords = stopWords; Otis (didn't follow the discussion closely, sorry if I repeated somebody else's words or if I'm way off) Gospodnetic --- Doug Cutting wrote: > ehatcher@apache.org wrote: > > - public StopFilter(TokenStream in, Set stopTable) { > > + public StopFilter(TokenStream in, Set stopWords) { > > super(in); > > - table = stopTable; > > + this.stopWords = new HashSet(stopWords); > > } > > This always allocates a new HashSet, which, if the stop list is > large, > and documents are small, could impact performance. > > Perhaps we can replace this with something like: > > public StopFilter(TokenStream in, Set stopWords) { > this(in, stopWords instanceof HashSet ? ((HashSet)stopWords) > : new HashSet(stopWords)); > } > > and then add another constructor: > > private StopFilter(TokenStream in, HashSet stopWords) { > super(in); > this.stopWords = stopTable; > } > > Also, if we want the implementation to always be a HashSet > internally, > for performance, we ought to declare the field to be a HashSet, no? > > The competing goals here are: > 1. Not to expose publicly the implementation of the Set; > 2. Not to copy the contents of the Set when folks pass the value > of > makeStopSet. > 3. Use the most efficient implementation internally. > > I think the changes above meet all of these. > > Doug --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org