lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Goddard, Michael J." <>
Subject Question on highlighting of nested SpanQuery instances
Date Fri, 19 Feb 2010 19:50:19 GMT

I initially posted a version of this question to java-user, but think it's more of a java-dev
question.  I haven't yet been able to resolve why I'm seeing spurious highlighting in nested
SpanQuery instances.  To illustrate this, I added the code below to the HighlighterTest class
in lucene_2_9_1:

 * Ref:
public void testHighlightingNestedSpans2() throws Exception {

  String theText = "The Lucene was made by Doug Cutting and Lucene great Hadoop was"; // Problem
  //String theText = "The Lucene was made by Doug Cutting and the great Hadoop was"; // Works

  String fieldName = "SOME_FIELD_NAME";

  SpanNearQuery spanNear = new SpanNearQuery(new SpanQuery[] {
    new SpanTermQuery(new Term(fieldName, "lucene")),
    new SpanTermQuery(new Term(fieldName, "doug")) }, 5, true);

  Query query = new SpanNearQuery(new SpanQuery[] { spanNear,
    new SpanTermQuery(new Term(fieldName, "hadoop")) }, 4, true);

  String expected = "The <B>Lucene</B> was made by <B>Doug</B> Cutting
and Lucene great <B>Hadoop</B> was";
  //String expected = "The <B>Lucene</B> was made by <B>Doug</B> Cutting
and the great <B>Hadoop</B> was";

  String observed = highlightField(query, fieldName, theText);
  System.out.println("Expected: \"" + expected + "\n" + "Observed: \"" + observed);

  assertEquals("Why is that second instance of the term \"Lucene\" highlighted?", expected,

Is this an issue that's arisen before?  I've been reading through the source to QueryScorer,
WeightedSpanTerm, WeightedSpanTermExtractor, Spans, and NearSpansOrdered, but haven't found
the solution yet.  Initially, I thought that the extractWeightedSpanTerms method in WeightedSpanTermExtractor
should be called on each clause of a SpanNearQuery or SpanOrQuery, but that didn't get me
too far.

Any suggestions are welcome.



View raw message