lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Smith <ssm...@mainstreamdata.com>
Subject Highlighting html pages
Date Wed, 24 Oct 2012 00:00:42 GMT
I need to take an html page  that I retrieve from my lucene search and highlight all of the
terms that are part of the search.  I need to skip over any html tags since I don't want any
words in tags which happen to match the search to be highlighted.

Note that I don't want sections of the document.  I need to highlight all terms in the document
(with a <span> or something similar) and get back the entire document (with the new
<span>s) so it can be displayed in its entirety with the search terms highlighted.

Last time I did this (in the days of 1.4.2 - so a while ago), I had to write a custom tokenizer
that skipped over the html tokens so that I didn't accidentally highlight them.  I'm hoping
that there is an easier way to do this now.

Suggestions?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message