From java-user-return-29074-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Sun Jul 08 21:12:32 2007 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 12545 invoked from network); 8 Jul 2007 21:12:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Jul 2007 21:12:31 -0000 Received: (qmail 30116 invoked by uid 500); 8 Jul 2007 21:12:27 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 30084 invoked by uid 500); 8 Jul 2007 21:12:26 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 30069 invoked by uid 99); 8 Jul 2007 21:12:26 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Jul 2007 14:12:26 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [165.212.64.22] (HELO gateout02.mbox.net) (165.212.64.22) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Jul 2007 14:12:22 -0700 Received: from gateout02.mbox.net (gateout02.mbox.net [165.212.64.22]) by gateout02.mbox.net (Postfix) with ESMTP id CC7AC20D7 for ; Sun, 8 Jul 2007 21:12:01 +0000 (GMT) Received: from GW2.EXCHPROD.USA.NET [165.212.116.254] by gateout02.mbox.net via smtad (C8.MAIN.3.34P) with ESMTP id XID043LgHVmB9363Xo2; Sun, 08 Jul 2007 21:12:01 -0000 X-USANET-Source: 165.212.116.254 IN jkim@sitescape.com GW2.EXCHPROD.USA.NET X-USANET-MsgId: XID043LgHVmB9363Xo2 Received: from PEACE ([216.195.213.27]) by GW2.EXCHPROD.USA.NET with Microsoft SMTPSVC(6.0.3790.1830); Sun, 8 Jul 2007 15:10:14 -0600 From: "Jong Kim" To: Subject: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project Date: Sun, 8 Jul 2007 17:12:08 -0400 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_00D0_01C7C183.1E2D99D0" X-Mailer: Microsoft Office Outlook, Build 11.0.6353 Thread-Index: AcfBpKSyO0IYmEogT7GjQstrKM6bQQ== X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 Message-ID: X-OriginalArrivalTime: 08 Jul 2007 21:10:14.0512 (UTC) FILETIME=[60BC9F00:01C7C1A4] X-Virus-Checked: Checked by ClamAV on apache.org ------=_NextPart_000_00D0_01C7C183.1E2D99D0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Hi, The MoreLikeThis class in Lucene's contrib/queries project performs noise word filtering based on the case-sensitive comparison of the terms against the user-supplied stopwords set. I need this comparison to be case-insensitive, but I don't see any way of achieving it by extending this class. I would have created a subclass of MoreLikeThis and override the isNoiseWord() method. However, the problem is that, neither isNoiseWord() method nor the instance variables referenced inside that method are declared protected. They are all private. Has anyone solved this problem without modifying and building MoreLikeThis class directly? An alternative approach would be to supply a stopwords list containing all variants of the stop words with all possible mixed cases. Needless to say, that isn't likely to be a workable solution for many. Ultimately it would be nice if those methods and variables would have been made protected so that applications could override some of the default behaviors without having to modify the class directly. Any help would be appreciated. Thanks /Jong ------=_NextPart_000_00D0_01C7C183.1E2D99D0--