Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 82887 invoked from network); 16 Sep 2006 19:04:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 16 Sep 2006 19:04:56 -0000 Received: (qmail 82900 invoked by uid 500); 16 Sep 2006 19:04:51 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 82205 invoked by uid 500); 16 Sep 2006 19:04:49 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 82194 invoked by uid 99); 16 Sep 2006 19:04:49 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Sep 2006 12:04:49 -0700 X-ASF-Spam-Status: No, hits=1.8 required=10.0 tests=RCVD_IN_SORBS_SOCKS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of michael.imbeault@sympatico.ca designates 209.226.175.97 as permitted sender) Received: from [209.226.175.97] (HELO tomts40-srv.bellnexxia.net) (209.226.175.97) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Sep 2006 12:04:47 -0700 Received: from [69.157.158.89] by tomts40-srv.bellnexxia.net (InterMail vM.5.01.06.13 201-253-122-130-113-20050324) with ESMTP id <20060916190425.NWJK24981.tomts40-srv.bellnexxia.net@[69.157.158.89]> for ; Sat, 16 Sep 2006 15:04:25 -0400 Message-ID: <450C4ABB.8070101@sympatico.ca> Date: Sat, 16 Sep 2006 15:04:27 -0400 From: Michael Imbeault User-Agent: Thunderbird 1.5.0.7 (Windows/20060909) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Better Highligther fragmenter? Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N I'm now using the excellent Hightlighter from within Solr and it works very well; except that the generated fragments sometimes begins with bad-looking characters (the "." of the end of the previous phrase, or a ), /10, etc). The same is true for the fragments ends. I looked at both the dev and user lucene list in search for a better Fragmenter class, but it seems that there's none right now (just the simple and null fragmenters). To me the 'simple' fragmenter is a bit too simple; anyone had success in implementing a more intelligent one? I have no java coding experience, sadly, so I don't know where to begin on this one. I don't think fancy phrase recognition is needed; just a better boundary algorithm (avoid beginning / ending fragments with bad looking characters) and the addition of "..." at the end and beginning of the fragment if fragmentation of a phrase took place. Also, is it required that the highlighted field is 'stored'? I'm pretty sure it is, but just want confirmation. Thanks, -- Michael Imbeault CHUL Research Center (CHUQ) 2705 boul. Laurier Ste-Foy, QC, Canada, G1V 4G2 Tel: (418) 654-2705, Fax: (418) 654-2212 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org