lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Imbeault <>
Subject Better Highligther fragmenter?
Date Sat, 16 Sep 2006 19:04:27 GMT
I'm now using the excellent Hightlighter from within Solr and it works 
very well; except that the generated fragments sometimes begins with 
bad-looking characters (the "." of the end of the previous phrase, or a 
), /10, etc). The same is true for the fragments ends. I looked at both 
the dev and user lucene list in search for a better Fragmenter class, 
but it seems that there's none right now (just the simple and null 

To me the 'simple' fragmenter is a bit too simple; anyone had success in 
implementing a more intelligent one? I have no java coding experience, 
sadly, so I don't know where to begin on this one. I don't think fancy 
phrase recognition is needed; just a better boundary algorithm (avoid 
beginning / ending fragments with bad looking characters) and the 
addition of "..." at the end and beginning of the fragment if 
fragmentation of a phrase took place.

Also, is it required that the highlighted field is 'stored'? I'm pretty 
sure it is, but just want confirmation.


Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message