Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ED185D710 for ; Wed, 5 Sep 2012 18:13:29 +0000 (UTC) Received: (qmail 14534 invoked by uid 500); 5 Sep 2012 18:13:26 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 14377 invoked by uid 500); 5 Sep 2012 18:13:26 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Delivered-To: moderator for solr-user@lucene.apache.org Received: (qmail 30662 invoked by uid 99); 5 Sep 2012 17:51:42 -0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of wunder@chegg.com designates 216.32.180.30 as permitted sender) X-Forefront-Antispam-Report: CIP:157.56.244.213;KIP:(null);UIP:(null);IPV:NLI;H:CH1PRD0510HT005.namprd05.prod.outlook.com;RD:none;EFVD:NLI X-SpamScore: 3 X-BigFish: PS3(zzc85fhzz1202hzz8275bhz31h2a8h668h839hd25he5bhf0ah107ahbe3k1155h) Received-SPF: softfail (mail199-va3: transitioning domain of chegg.com does not designate 157.56.244.213 as permitted sender) client-ip=157.56.244.213; envelope-from=wunder@chegg.com; helo=CH1PRD0510HT005.namprd05.prod.outlook.com ;.outlook.com ; From: Walter Underwood To: "solr-user@lucene.apache.org" Subject: EdgeNgramTokenFilter and positions Thread-Topic: EdgeNgramTokenFilter and positions Thread-Index: AQHNi48GyEvmX3/DyEm85QZMmoBPCA== Date: Wed, 5 Sep 2012 17:51:06 +0000 Message-ID: <222D872F-4B04-48AE-A044-9FB70851E70F@chegg.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.255.150.4] Content-Type: multipart/alternative; boundary="_000_222D872F4B0448AEA0449FB70851E70Fcheggcom_" MIME-Version: 1.0 X-OriginatorOrg: chegg.com --_000_222D872F4B0448AEA0449FB70851E70Fcheggcom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable In the analysis page, the n-grams produced by EdgeNgramTokenFilter are at s= equential positions. This seems wrong, because an n-gram is associated with= a source token at a specific position. It also really messes up phrase mat= ches. With the source text "fleen", these positions and tokens are generated: 1,fl 2,fle 3,flee 4,fleen Is this a known bug? Fixed? I'm running 3.3. wunder -- Walter Underwood Search Guy wunder@chegg.com --_000_222D872F4B0448AEA0449FB70851E70Fcheggcom_--