Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 54666 invoked from network); 2 Sep 2006 16:26:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 2 Sep 2006 16:26:34 -0000 Received: (qmail 20767 invoked by uid 500); 2 Sep 2006 16:26:27 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 20743 invoked by uid 500); 2 Sep 2006 16:26:27 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 20732 invoked by uid 99); 2 Sep 2006 16:26:27 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Sep 2006 09:26:27 -0700 X-ASF-Spam-Status: No, hits=2.8 required=10.0 tests=DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST,DNS_FROM_RFC_WHOIS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [206.190.38.58] (HELO web50304.mail.yahoo.com) (206.190.38.58) by apache.org (qpsmtpd/0.29) with SMTP; Sat, 02 Sep 2006 09:26:26 -0700 Received: (qmail 48992 invoked by uid 60001); 2 Sep 2006 16:26:05 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=mW2Mc8ZnGc0Q99ieAn4a2s5qY15lex7MsNrp72/gPDhEpZqvofyvgZ1U9J8FTadjwV7hl8fpy0FpSeJbfISShI0GulQXfwaT6RwTy6Tf64juod0b5GhUcE5AjbZwHW7mVMtu069wp1R3FKZW1uc4xU/o2nhSBM/JtA7uhTeqsl0= ; Message-ID: <20060902162605.48990.qmail@web50304.mail.yahoo.com> Received: from [74.65.202.166] by web50304.mail.yahoo.com via HTTP; Sat, 02 Sep 2006 09:26:05 PDT Date: Sat, 2 Sep 2006 09:26:05 -0700 (PDT) From: Otis Gospodnetic Reply-To: Otis Gospodnetic Subject: Re: Stop words in index To: java-user@lucene.apache.org In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N They shouldn't be in the index. You must be using StandardAnalyzer incorrectly, or maybe you think you are using it, but are really using something else. Otis ----- Original Message ---- From: Jason Polites To: java-user@lucene.apache.org Sent: Saturday, September 2, 2006 9:05:27 AM Subject: Stop words in index Hey all, I am using the StandardAnalyzer with my own list of stop words (which is more comprehensive than the default list), and my expectation was that this would omit these stop words from the index when data is indexed using this analyzer. However, I am seeing stop words in the term vector for documents indexed with this analyzer. Is this expected behaviour? Is there any way I can force these stop words to be omitted from the index? Having them in the index is wreaking havoc with term vector analysis to determine document similarity. Thanks. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org