Return-Path: Delivered-To: apmail-lucene-solr-user-archive@locus.apache.org Received: (qmail 59628 invoked from network); 23 Apr 2008 01:10:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 Apr 2008 01:10:51 -0000 Received: (qmail 41419 invoked by uid 500); 23 Apr 2008 01:10:48 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 41106 invoked by uid 500); 23 Apr 2008 01:10:47 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 41095 invoked by uid 99); 23 Apr 2008 01:10:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Apr 2008 18:10:47 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of cwittern@gmail.com designates 66.249.82.224 as permitted sender) Received: from [66.249.82.224] (HELO wx-out-0506.google.com) (66.249.82.224) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Apr 2008 01:09:54 +0000 Received: by wx-out-0506.google.com with SMTP id i28so2176076wxd.20 for ; Tue, 22 Apr 2008 18:09:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; bh=ZsTCrqJtr7+HGB5FKqltuiMUTREoZV+kIYoxOop8OkU=; b=RFJbQ7k3P9QbnHRBtQ5VKL2RCstp+5/KxOPEimK0PRKQWKypYQHwG3gwrwPU4LMh0WTiV0QkT9U+Jz6uNqAW8bMPjpxu+dDPDz/mLLWhIr9jTn9nzHA1LsvllSs7QEb/xUAm6Icjas/35t+8uATFQmRARK8eVYQkf0CM0XDKbxg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; b=TIJvWsu6FsHjHOmNPyKUkSHQG8zJ6H7lzlJSpo6g526wB9QinWAwpQG5ek+FRrji5HfCOaiiC9F8pWxl27HwceFFqHKu7NhGeah7El9NxmRM3Ve2mmcT4KRfm2UyWpVCSSUv4CTDLrW6cArD2IPjFg9pZaYa/ytmXIQLa7qs0KY= Received: by 10.70.46.1 with SMTP id t1mr995437wxt.62.1208912396174; Tue, 22 Apr 2008 17:59:56 -0700 (PDT) Received: from chw.zinbun.kyoto-u.ac.jp ( [130.54.104.146]) by mx.google.com with ESMTPS id h17sm4170595wxd.24.2008.04.22.17.59.53 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 22 Apr 2008 17:59:54 -0700 (PDT) Message-ID: <480E8A34.6040608@gmail.com> Date: Wed, 23 Apr 2008 10:00:36 +0900 From: Christian Wittern User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213) MIME-Version: 1.0 To: solr-user@lucene.apache.org Subject: Re: Highlighted field gets truncated References: <480862F5.9010502@gmail.com> <48086E36.8020703@gmail.com> <0A188137-43DB-4551-8BE4-A02B89620215@gmail.com> <4809C32E.4050106@gmail.com> <1BCEB274-44F2-4561-89D0-E3244AE44366@gmail.com> In-Reply-To: <1BCEB274-44F2-4561-89D0-E3244AE44366@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Mike Klaas wrote: > On 19-Apr-08, at 3:02 AM, Christian Wittern wrote: >> So it could be that the match is not part of the fragment? This >> sounds a bit strange. Is there a way to make sure the fragment >> contains the match other than returning the whole field and do the >> fragmenting myself? > [...] > As you can see, only fragments containing a match are returned (note > that there is very often multiple matches--you seemed to assume only > one). > Mike, thank you for the clarification. Now I understand what went wrong in the example I looked at. I am querying ngram indexed data (Chinese text). A user enters two or three characters and expect them to be matched more or less as a substring match. The fragment I looked at did contain only one of the characters (the other was cut off at the end), this is what made me wondering. From what you say, even adding quotation marks around the query will not prevent this from happening (in this case, it would simply obscure the match). Are there any plans to improve the algorithm for fragmentation? Or are there other work arounds? All the best, Christian