Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 20674 invoked from network); 2 Sep 2006 13:05:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 2 Sep 2006 13:05:57 -0000 Received: (qmail 86997 invoked by uid 500); 2 Sep 2006 13:05:51 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 86956 invoked by uid 500); 2 Sep 2006 13:05:51 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 86945 invoked by uid 99); 2 Sep 2006 13:05:51 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Sep 2006 06:05:51 -0700 X-ASF-Spam-Status: No, hits=1.1 required=10.0 tests=DNS_FROM_RFC_ABUSE,HTML_00_10,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of jason.polites@gmail.com designates 66.249.82.237 as permitted sender) Received: from [66.249.82.237] (HELO wx-out-0506.google.com) (66.249.82.237) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Sep 2006 06:05:48 -0700 Received: by wx-out-0506.google.com with SMTP id s15so1356990wxc for ; Sat, 02 Sep 2006 06:05:27 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type; b=aYgiAAcw7e7Yr6FfAAG+exIoiFZIQlfTe58RqJbGJQRHHfG4GWBOToVptJH6a0dv8jRQWR5NFGpVisdlj17tkbLqyBKQxfRE+FSBC1NPscNgqKEToxt3xLcBQmETlqShFZE1PJAdYuys8+GZE3sJqcJE0WVP2iiDDP4vu+XEiPk= Received: by 10.70.42.15 with SMTP id p15mr4253634wxp; Sat, 02 Sep 2006 06:05:27 -0700 (PDT) Received: by 10.70.109.14 with HTTP; Sat, 2 Sep 2006 06:05:27 -0700 (PDT) Message-ID: Date: Sat, 2 Sep 2006 23:05:27 +1000 From: "Jason Polites" To: java-user@lucene.apache.org Subject: Stop words in index MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_74623_10814250.1157202327703" X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_74623_10814250.1157202327703 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Hey all, I am using the StandardAnalyzer with my own list of stop words (which is more comprehensive than the default list), and my expectation was that this would omit these stop words from the index when data is indexed using this analyzer. However, I am seeing stop words in the term vector for documents indexed with this analyzer. Is this expected behaviour? Is there any way I can force these stop words to be omitted from the index? Having them in the index is wreaking havoc with term vector analysis to determine document similarity. Thanks. ------=_Part_74623_10814250.1157202327703--