Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B7D019CAF for ; Mon, 12 Dec 2011 12:22:14 +0000 (UTC) Received: (qmail 32256 invoked by uid 500); 12 Dec 2011 12:22:11 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 32200 invoked by uid 500); 12 Dec 2011 12:22:11 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 32192 invoked by uid 99); 12 Dec 2011 12:22:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Dec 2011 12:22:11 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rcmuir@gmail.com designates 74.125.82.48 as permitted sender) Received: from [74.125.82.48] (HELO mail-ww0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Dec 2011 12:22:04 +0000 Received: by wgbdt12 with SMTP id dt12so7198767wgb.5 for ; Mon, 12 Dec 2011 04:21:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=mPZCyyAlzGMfpdkv1kajtvox+PF4TcuvFTcnRxVeSAQ=; b=D+jXwBbNois1734ML/VtcfWyHCpMljkLUGJjhB66lhxr6WDz0rGI8fU9iAg279pLQN XlBR0bBpmYyarrLR8qte3PitBPF3F5FnprKxvEHxpRNznL3Tw/D9mF0XQQFcI9vQ89d8 NDij4fP1V8bNLnP7bIozBltvNTatt1lVB4bv8= Received: by 10.227.208.199 with SMTP id gd7mr14151214wbb.2.1323692503115; Mon, 12 Dec 2011 04:21:43 -0800 (PST) MIME-Version: 1.0 Received: by 10.216.6.198 with HTTP; Mon, 12 Dec 2011 04:21:23 -0800 (PST) In-Reply-To: References: From: Robert Muir Date: Mon, 12 Dec 2011 07:21:23 -0500 Message-ID: Subject: Re: InvalidTokenOffsetsException in conjunction with highlighting and ICU folding and edgeNgrams To: solr-user@lucene.apache.org, me@maxbeutel.de Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Mon, Dec 12, 2011 at 5:18 AM, Max wrote: > The end offset remains 11 even after folding and transforming "=C3=A6" to > "ae", which seems wrong to me. End offsets refer to the *original text* so this is correct. What is wrong, is EdgeNGramsFilter. See how it turns that 11 to a 12? > > I also stumbled upon https://issues.apache.org/jira/browse/LUCENE-1500 > which seems like a similiar issue. > > Is there a workaround for that problem or is the field configuration wron= g? For now, don't use EdgeNGrams. --=20 lucidimagination.com