lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raf <r.ventag...@gmail.com>
Subject How to extract only highlight spans?
Date Tue, 03 Jul 2012 19:22:40 GMT
Hi,
is it possibile to use Lucene Highlighter classes to extract highlight
spans instead of getting the "highlighted" string?
I am using lucene 3.0.3 (and I cannot upgrade version for now).

I have the following snippet of code:

QueryScorer scorer = new QueryScorer(highlightQuery);  // already rewritten
scorer.init(tokenStream);
tokenStream.reset();

Highlighter highlighter = new Highlighter(formatter, scorer);
highlighter.setTextFragmenter(fragmenter); // a NullFragmenter
String bestFragments = highlighter.getBestFragments(tokenStream,
textToHighlight, maxNumFragments, fragmentsSeparator);

This returns the highlighted text (with html spans in it).

Instead, I would like to be able to get only a list of "spans" (e.g. <4,10>
<15,27> ...) that correspond to text positions (same positions read by
tokenStream) to highlight.
I need them because I have to merge lucene query highlight with some custom
highlight info (already expressed as start/end spans) and it is very
difficult to merge the two info if lucene gives me only the highlighted
text.

Is there a way to extract this information using only the user query, the
text to highlight and the token stream of the search field?

Thank you in advance.

Bye
*Raf*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message