Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 37636 invoked from network); 9 Oct 2009 19:40:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Oct 2009 19:40:05 -0000 Received: (qmail 63439 invoked by uid 500); 9 Oct 2009 19:40:04 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 63374 invoked by uid 500); 9 Oct 2009 19:40:04 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 63366 invoked by uid 99); 9 Oct 2009 19:40:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Oct 2009 19:40:04 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Oct 2009 19:40:01 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 46ED3234C1EE for ; Fri, 9 Oct 2009 12:39:31 -0700 (PDT) Message-ID: <934603306.1255117171276.JavaMail.jira@brutus> Date: Fri, 9 Oct 2009 12:39:31 -0700 (PDT) From: "Chas Emerick (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1822) FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive In-Reply-To: <1833703219.1250634734860.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1822?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D127= 64168#action_12764168 ]=20 Chas Emerick commented on LUCENE-1822: -------------------------------------- Thank you for the patch. I agree, the context surrounding each fragment co= uld definitely be improved. > FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is = too naive > -------------------------------------------------------------------------= --------- > > Key: LUCENE-1822 > URL: https://issues.apache.org/jira/browse/LUCENE-1822 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/* > Affects Versions: 2.9 > Environment: any > Reporter: Alex Vigdor > Priority: Minor > Attachments: LUCENE-1822.patch > > > The new FastVectorHighlighter performs extremely well, however I've found= in testing that the window of text chosen per fragment is often very poor,= as it is hard coded in SimpleFragListBuilder to always select starting 6 c= haracters to the left of the first phrase match in a fragment. When select= ing long fragments, this often means that there is barely any context befor= e the highlighted word, and lots after; even worse, when highlighting a phr= ase at the end of a short text the beginning is cut off, even though the en= tire phrase would fit in the specified fragCharSize. For example, highligh= ting "Punishment" in "Crime and Punishment" returns "e and Punishment" no matter what fragCharSize is specified. I am going to attach a patch= that improves the text window selection by recalculating the starting marg= in once all phrases in the fragment have been identified - this way if a si= ngle word is matched in a fragment, it will appear in the middle of the hig= hlight, instead of 6 characters from the beginning. This way one can also = guarantee that the entirety of short texts are represented in a fragment by= specifying a large enough fragCharSize. --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org