Return-Path: Delivered-To: apmail-lucene-solr-user-archive@locus.apache.org Received: (qmail 43553 invoked from network); 12 Aug 2008 14:43:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 12 Aug 2008 14:43:07 -0000 Received: (qmail 45486 invoked by uid 500); 12 Aug 2008 14:43:03 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 45292 invoked by uid 500); 12 Aug 2008 14:43:02 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 45281 invoked by uid 99); 12 Aug 2008 14:43:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Aug 2008 07:43:02 -0700 X-ASF-Spam-Status: No, hits=-1.4 required=10.0 tests=RCVD_IN_DNSWL_MED,RCVD_NUMERIC_HELO,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [150.228.40.129] (HELO post1.merrillcorp.com) (150.228.40.129) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Aug 2008 14:42:04 +0000 Received: from relay1.stp.mrll.com (relay.stp.mrll.com [150.228.20.20]) by post1.merrillcorp.com (8.13.1/8.13.1/P2.8) with ESMTP id m7CEgU1D023129; Tue, 12 Aug 2008 09:42:30 -0500 Received: from EVS02.adminsys.mrll.com (stp2pmc01b.adminsys.mrll.com [150.228.20.42]) by relay1.stp.mrll.com (8.13.1/8.13.1/R1.13) with ESMTP id m7CEgUj3005762; Tue, 12 Aug 2008 09:42:30 -0500 Received: from 66.237.172.251 ([66.237.172.251]) by EVS02.adminsys.mrll.com ([150.228.20.47]) via Exchange Front-End Server webmail.merrillcorp.com ([172.30.19.181]) with Microsoft Exchange Server HTTP-DAV ; Tue, 12 Aug 2008 14:42:28 +0000 Received: from workstation29 by webmail.merrillcorp.com; 12 Aug 2008 10:42:28 -0400 Subject: Re: Highlighting Output From: Martin Owens To: Tricia Williams Cc: solr-user@lucene.apache.org In-Reply-To: <48A0CE8E.2060100@gmail.com> References: <48A0CE8E.2060100@gmail.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Tue, 12 Aug 2008 10:42:27 -0400 Message-Id: <1218552147.31484.28.camel@workstation29.ws.lextranet.com> Mime-Version: 1.0 X-Mailer: Evolution 2.6.1 X-Virus-Checked: Checked by ClamAV on apache.org I tried to post it myself, got the address wrong, thanks for re-posting. the problem we have with highlighting outside of the indexer is that the systems we use that store co-ords are... based on term string (in one case) and the specific term offset in another. Both of which break horribly when trying to do interesting things with solr/lucene. The only real way to do it is to store that term based data with the index. Otherwise you'll have to use the lucene query parser to reparse the search string and write our own searcher to search our custom xml co-ord files. Most unsatisfactory. P.S. I noticed that my original email had way too many spelling mistakes, sorry about that. Best Regards, Martin Owens On Mon, 2008-08-11 at 17:43 -0600, Tricia Williams wrote: > Martin, > > I've been over some of the same thoughts you present here in the last > few years. The path of least resistance ended up being to deal with the > highlighting portion of OCRed images outside of Solr. That's not to say > it couldn't or shouldn't be done differently. I briefly even pursued a > similar course of action evident in > https://issues.apache.org/jira/browse/SOLR-386. This would make it > easier if you wanted to write your own highlighter. > > I'm interested to see what others think of your suggestions. I've > forwarded this to the solr-user list. > > Tricia