Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 79529 invoked from network); 1 Nov 2007 21:11:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Nov 2007 21:11:14 -0000 Received: (qmail 26860 invoked by uid 500); 1 Nov 2007 21:10:59 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 26815 invoked by uid 500); 1 Nov 2007 21:10:59 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 26804 invoked by uid 99); 1 Nov 2007 21:10:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Nov 2007 14:10:59 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Nov 2007 21:11:11 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 097D571420E for ; Thu, 1 Nov 2007 14:10:51 -0700 (PDT) Message-ID: <13307671.1193951451036.JavaMail.jira@brutus> Date: Thu, 1 Nov 2007 14:10:51 -0700 (PDT) From: "Hoss Man (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1040) Can't quickly create StopFilter In-Reply-To: <27314913.1193864570975.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539469 ] Hoss Man commented on LUCENE-1040: ---------------------------------- > But it does if the Set is not a CharArraySet. > Docs should be clearer though that ignoreCase is ignored if passing a CharArraySet (or is there a more sane way?) Hmmm... actually .. does making the CharArraySet capable of doing case insensitive comparisons over complicate things? perhaps the CharArraySet should just be a set of strings that supports fast lookup lookup of strings, and StopFilter.next should be responsible for lowercasing the terms before doing the set lookup (and makeStopSet should be responsible for lowercasing hte tersm before putting them in the set). it would probably be good to change the sigs to something like StopFilter(TokenStream s, Set stopWords, boolean lowerCaseBeforeLookup) and makeStopSet(String[] stopWords, boolean lowerCaseBeforeAdding) so it was clear what expectations StopFilter has on the set if people make one from scratch. this is kind of an orthogonal API discussion to the CharArraySet issue though ... the same arguments could be made about the 2.2 instance of StopFilter ... i just bring it up as a potential point of confusion since people could expect these two work the same way, and they wont (unless i'm missing something)... Set a = ...; // something with lots of mixed case words Set b = new CharArraySet(a, false); StopFilter aaa = new StopFilter(stream, a, true) StopFilter bbb = new StopFilter(stream, b, true) > Can't quickly create StopFilter > ------------------------------- > > Key: LUCENE-1040 > URL: https://issues.apache.org/jira/browse/LUCENE-1040 > Project: Lucene - Java > Issue Type: Bug > Reporter: Yonik Seeley > Assignee: Yonik Seeley > Attachments: CharArraySet.patch, CharArraySet.take2.patch > > > Due to the use of CharArraySet by StopFilter, one can no longer efficiently pre-create a Set for use by future StopFilter instances. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org