lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Imbeault <michael.imbea...@sympatico.ca>
Subject Better Highligther fragmenter?
Date Sat, 16 Sep 2006 19:04:27 GMT
I'm now using the excellent Hightlighter from within Solr and it works 
very well; except that the generated fragments sometimes begins with 
bad-looking characters (the "." of the end of the previous phrase, or a 
), /10, etc). The same is true for the fragments ends. I looked at both 
the dev and user lucene list in search for a better Fragmenter class, 
but it seems that there's none right now (just the simple and null 
fragmenters).

To me the 'simple' fragmenter is a bit too simple; anyone had success in 
implementing a more intelligent one? I have no java coding experience, 
sadly, so I don't know where to begin on this one. I don't think fancy 
phrase recognition is needed; just a better boundary algorithm (avoid 
beginning / ending fragments with bad looking characters) and the 
addition of "..." at the end and beginning of the fragment if 
fragmentation of a phrase took place.

Also, is it required that the highlighted field is 'stored'? I'm pretty 
sure it is, but just want confirmation.

Thanks,

-- 
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message