lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: Extracting span terms using WeightedSpanTermExtractor
Date Thu, 07 Jul 2011 23:47:30 GMT

On Jul 7, 2011, at 5:14 PM, Jahangir Anwari wrote:

> I did noticed a strange issue though. When the query is just a
> PhraseQuery(e.g. "everlasting glory"), getWeightedSpanTerms() returns all
> the span terms along with their span positions. But when the query is a
> BooleanQuery containing phrase and non-phrase terms(e.g. "everlasting
> glory"+unity), getWeightedSpanTerms() returns all the span terms but the
> span positions are returned only for the phrase terms(i.e. "everlasting" and
> "glory"). Span positions for the non-phrase term(i.e. "unity") is empty. Any
> ideas why this could be happening?

Positions are only collected for "position sensitive" queries. The Highlighter framework that
I plugged this into already runs through the TokenStream one token at a time - to highlight
a TermQuery, there is no need to consult positions - just highlight every occurrence seen
while marching through the TokenStream. Which means there is no need to find those positions

If you are looking for those positions, here is a patch to calculate them for TermQuerys as
well. If you open a JIRA issue, seems like a reasonable option to add to the class.

Index: lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/
--- lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/
(revision 1143407)
+++ lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/
(working copy)
@@ -133,7 +133,7 @@
       extractWeightedSpanTerms(terms, sp);
     } else if (query instanceof TermQuery) {
-      extractWeightedTerms(terms, query);
+      extractWeightedSpanTerms(terms, new SpanTermQuery(((TermQuery)query).getTerm()));
     } else if (query instanceof SpanQuery) {
       extractWeightedSpanTerms(terms, (SpanQuery) query);
     } else if (query instanceof FilteredQuery) {

- Mark Miller

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message