Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 67969 invoked from network); 6 Dec 2005 04:32:36 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 6 Dec 2005 04:32:36 -0000 Received: (qmail 97964 invoked by uid 500); 6 Dec 2005 04:32:33 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 96974 invoked by uid 500); 6 Dec 2005 04:32:30 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 96960 invoked by uid 99); 6 Dec 2005 04:32:30 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Dec 2005 20:32:30 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of dcliman@keepmedia.com designates 63.82.1.162 as permitted sender) Received: from [63.82.1.162] (HELO mx1.keepmedia.com) (63.82.1.162) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Dec 2005 20:32:29 -0800 Received: from localhost (localhost.localdomain [127.0.0.1]) by mx1.keepmedia.com (Postfix) with ESMTP id 1663723345E for ; Mon, 5 Dec 2005 20:31:37 -0800 (PST) Received: from mx1.keepmedia.com ([127.0.0.1]) by localhost (lucky.corp.keepmedia.int [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 28226-05 for ; Mon, 5 Dec 2005 20:31:36 -0800 (PST) Received: from KEEP14 (keep14.corp.keepmedia.int [192.168.1.147]) by mx1.keepmedia.com (Postfix) with ESMTP id C741F233442 for ; Mon, 5 Dec 2005 20:31:36 -0800 (PST) From: "Dan Climan" To: Subject: Highlighter, Term Positions and Stopwords Date: Mon, 5 Dec 2005 20:32:12 -0800 Organization: KeepMedia, Inc. Message-ID: <003901c5fa1e$06f3c8d0$a701a8c0@corp.keepmedia.int> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_003A_01C5F9DA.F8D088D0" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6626 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 Importance: Normal X-Virus-Scanned: by amavisd-new at keepmedia.com X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_NextPart_000_003A_01C5F9DA.F8D088D0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Do stopfilters create non-contiguous token positions? =20 I was interested in experimenting with the highlighter and using the TokenSources.getTokenStream(TermPositionVector tpv, boolean tokenPositionsGuaranteedContiguous) method =20 The javadocs for this method note that: tokenPositionsGuaranteedContiguous - true if the token position numbers = have no overlaps or gaps. =20 The example used for comparison to re-Analyzing the the text includes stopwords ("timings above were using a stemmer/lowercaser/stopword = combo"). I was curious if a stopwords, by definition meant that tokens were not contiguous? Is this still true if the the query uses the same analyzer = and filters out the same stopwords? =20 Thanks, Dan ------=_NextPart_000_003A_01C5F9DA.F8D088D0--