lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Hill <p...@metajure.com>
Subject Hit Highlighting which highlighter to use?
Date Wed, 04 Apr 2012 20:06:11 GMT
Using the original org.apache.lucene.search.highlight.Highlighter should I be able to give
it a query like [ My AND Words AND "My Words"^100 ] (the actually phrase in this query is
converted to a span query with a slop 1),
and expect it find the fragment many pages into the file that has span "My Words" and rank
it better than fragments earlier in the document with "My" and "Word" (or lots of "My" and
"Words")?

I  ask because currently, I'm not getting the fragment with the phrase as the best fragment,
and I go through some hacky post processing to look down the list for a "better" match, but
I'm wondering if we have the HitHighlighter wired up wrong.

At this time, my index does not have offsets and positions vectors for all tokenized fields
and the body "text" field just how positions.

I understand that FastVectorHighlighter is fast, but would it do a better job of finding the
phrase or span in the text if I added positions and offsets to text?

When highlighting the small fields like title, path etc.  should I add term vector with positions
and offset and use FastVectorHighlighter or is it just not worth storing that extra information
just for highlighting?

-Paul



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message