Return-Path: Delivered-To: apmail-lucene-solr-user-archive@locus.apache.org Received: (qmail 19281 invoked from network); 4 Jan 2007 01:32:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Jan 2007 01:32:07 -0000 Received: (qmail 80172 invoked by uid 500); 4 Jan 2007 01:32:14 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 80042 invoked by uid 500); 4 Jan 2007 01:32:13 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 80033 invoked by uid 99); 4 Jan 2007 01:32:13 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jan 2007 17:32:13 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of wunderwood@netflix.com designates 216.35.131.152 as permitted sender) Received: from [216.35.131.152] (HELO mx2.netflix.com) (216.35.131.152) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jan 2007 17:32:03 -0800 Received: from message.netflix.com (exchangeav [10.1.122.79]) by mx2.netflix.com (8.12.11.20060308/8.12.11) with ESMTP id l041VQkJ001138 for ; Wed, 3 Jan 2007 17:31:26 -0800 Received: from Superfly.netflix.com ([10.1.122.93]) by message.netflix.com with Microsoft SMTPSVC(6.0.3790.1830); Wed, 3 Jan 2007 17:31:39 -0800 Received: from 10.2.164.78 ([10.2.164.78]) by superfly.netflix.com ([10.1.122.93]) with Microsoft Exchange Server HTTP-DAV ; Thu, 4 Jan 2007 01:31:39 +0000 User-Agent: Microsoft-Entourage/11.3.2.061213 Date: Wed, 03 Jan 2007 17:35:04 -0800 Subject: Re: Better highlighting fragmenter From: Walter Underwood To: Message-ID: Thread-Topic: Better highlighting fragmenter Thread-Index: AccvoI7GzR2kCpuTEdurJwAUUTF+rA== In-Reply-To: <3d2ce8cb0701031713s9e19b88s144b985cef502b5f@mail.gmail.com> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-OriginalArrivalTime: 04 Jan 2007 01:31:39.0759 (UTC) FILETIME=[1509DFF0:01C72FA0] X-Brightmail-Tracker: AAAAAQAAA+k= X-Language-Identified: TRUE X-Virus-Checked: Checked by ClamAV on apache.org On 1/3/07 5:13 PM, "Mike Klaas" wrote: > Generally, we should strive for a high-quality out-of-the-box > highlighting in Solr. That might involve making things like better > fragmenters and a few other tricks(*) the default setup, and providing > a "quick & dirty" setting for speed demons. I've implemented this before, once in Python and once in C, so I'd be glad to take a look at it. I'm not sure I have time to do a lot of implementation, but I'd sure be glad to help. We tried several APIs and decided that the best was an array of String with the odd elements containing the strings that needed highlighting. That made it really easy to step through and wrap highlighted stuff with the right markup, while properly escaping any angle brackets in the source text. I'm not sure how easy it is to handle that format in XSLT, but it might be worth it. Embedded highlight markup just doesn't work. wunder -- Walter Underwood Search Guru, Netflix