Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 25940 invoked by uid 500); 19 Oct 2001 16:12:38 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 25929 invoked from network); 19 Oct 2001 16:12:38 -0000 Message-ID: <4BC270C6AB8AD411AD0B00B0D0493DF0EE7C8D@mail.grandcentral.com> From: Doug Cutting To: 'Lee Mallabone' , lucene-user@jakarta.apache.org Subject: RE: Context specific summary with the search term Date: Fri, 19 Oct 2001 09:01:38 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N > From: Lee Mallabone [mailto:lee@grantadesign.com] > > This is something I also need to implement in the very near future. My > current thoughts are to use a variant of Maik Schreiber's way of doing > term highlighting in documents. See: > http://www.iq-computing.de/lucene/highlight.htm > > Rather than highlight terms, I would just extract the first hit token, > and a certain number of characters either side of it. > > This may not be the best approach, but it looks like the > easiest method to get working. I'm also not sure how realistic it will be from a > performance perspective, so if people have any alternative > ideas, I'd be happy to collaborate on an implementation... I think this is the best approach. Since you'll probably only be displaying around ten hits at a time, the cost of re-tokenizing is fairly small. Please consider contributing your code when it is complete. Doug