Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 18571 invoked from network); 19 Aug 2009 02:27:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 19 Aug 2009 02:27:18 -0000 Received: (qmail 57604 invoked by uid 500); 19 Aug 2009 02:27:36 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 57509 invoked by uid 500); 19 Aug 2009 02:27:36 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 57501 invoked by uid 99); 19 Aug 2009 02:27:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Aug 2009 02:27:36 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Aug 2009 02:27:35 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id D261129A0011 for ; Tue, 18 Aug 2009 19:27:14 -0700 (PDT) Message-ID: <1680631117.1250648834860.JavaMail.jira@brutus> Date: Tue, 18 Aug 2009 19:27:14 -0700 (PDT) From: "Alex Vigdor (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Issue Comment Edited: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments In-Reply-To: <385671505.1250642595175.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744830#action_12744830 ] Alex Vigdor edited comment on LUCENE-1824 at 8/18/09 7:25 PM: -------------------------------------------------------------- Actually a couple of the existing tests specifically check for the faulty behavior - the attached patch for SimpleFragmentsBuilderTest tests for the non-truncating behavior implemented in the patch. For example, where the prior test looked for "ssing speed", it now looks for " processing speed". While fixing the tests I noticed an off-by-1 error in the orginal patch, which I have updated. was (Author: alexvigdor): Actually a couple of the existing tests specifically check for the faulty behavior - the attached patch for SimpleFragmentsBuilderTest tests for the non-truncating behavior implemented in the patch. For example, where the prior test looked for "ssing speed", it now looks for " processing speed". > FastVectorHighlighter truncates words at beginning and end of fragments > ----------------------------------------------------------------------- > > Key: LUCENE-1824 > URL: https://issues.apache.org/jira/browse/LUCENE-1824 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/* > Environment: any > Reporter: Alex Vigdor > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1824-test.patch, LUCENE-1824.patch > > > FastVectorHighlighter does not take word boundaries into consideration when building fragments, so that in most cases the first and last word of a fragment are truncated. This makes the highlights less legible than they should be. I will attach a patch to BaseFragmentBuilder that resolves this by expanding the start and end boundaries of the fragment to the first whitespace character on either side of the fragment, or the beginning or end of the source text, whichever comes first. This significantly improves legibility, at the cost of returning a slightly larger number of characters than specified for the fragment size. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org