Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7B256101F0 for ; Sat, 7 Sep 2013 12:40:21 +0000 (UTC) Received: (qmail 75751 invoked by uid 500); 7 Sep 2013 12:40:19 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 75712 invoked by uid 500); 7 Sep 2013 12:40:18 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 75699 invoked by uid 99); 7 Sep 2013 12:40:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Sep 2013 12:40:17 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rcmuir@gmail.com designates 74.125.82.46 as permitted sender) Received: from [74.125.82.46] (HELO mail-wg0-f46.google.com) (74.125.82.46) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Sep 2013 12:40:12 +0000 Received: by mail-wg0-f46.google.com with SMTP id k14so3863917wgh.25 for ; Sat, 07 Sep 2013 05:39:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=bYMPMfbzLn+dpjpqDCwIou25SKzss9eFAruZYG3I0kg=; b=GX2xkSZVTza/CVG6V2EdAbbvUPujfAk2LtWvSOMDz6F7N1YbdTAVehUFBfuNonlLGm IpVaGssW9yV0tvzST+Lgm25CeLYNRq7PFWZhVnQHX7Y3DdNxeb9BGVzPE8a7/UFefnjN pcxmeRFFyAwYr5yvxuaGS8xI352UuzqriYhs7XajdFnoZ89eK4fetmCaNemR79T56Vfw +spRz7lIfgCDovpZOeL5TteX0AK3Jd26Zkd8NJ8yblSmzRTBiA2eNWzqAjF3pTgx4gY0 AxIS/+0Dc2X6VPOhSsMXrnawv05fFztbns+GKWG8mpG/tWMzH7Nb3crZ7WHawdomh/ao 2awA== X-Received: by 10.180.82.164 with SMTP id j4mr1969685wiy.65.1378557591711; Sat, 07 Sep 2013 05:39:51 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.221.229 with HTTP; Sat, 7 Sep 2013 05:39:31 -0700 (PDT) In-Reply-To: References: From: Robert Muir Date: Sat, 7 Sep 2013 08:39:31 -0400 Message-ID: Subject: Re: PositionLengthAttribute To: java-user Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org On Sat, Sep 7, 2013 at 7:44 AM, Benson Margulies wrote: > In Japanese, compounds are just decompositions of the input string. In > other languages, compounds can manufacture entire tokens from thin > air. In those cases, it's something of a question how to decide on the > offsets. I think that you're right, eventually, insofar as there's > some offset in the original that might as well be blamed for any given > component. > Why change the offsets then? Offsets are for highlighting. Let the whole compound be highlighted when its a match in search results. Its transparent and totally accurate as to what is happening: this is why we do highlighting, to aid the user can make a relevance assessment about the document, not to try to assist the end user to debug the analysis chain or anything like that. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org