lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Goddard, Michael J." <MICHAEL.J.GODD...@saic.com>
Subject Question on highlighting of nested SpanQuery instances
Date Fri, 19 Feb 2010 19:50:19 GMT
Hello,

I initially posted a version of this question to java-user, but think it's more of a java-dev
question.  I haven't yet been able to resolve why I'm seeing spurious highlighting in nested
SpanQuery instances.  To illustrate this, I added the code below to the HighlighterTest class
in lucene_2_9_1:

/*
 * Ref: http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/
 */
public void testHighlightingNestedSpans2() throws Exception {

  String theText = "The Lucene was made by Doug Cutting and Lucene great Hadoop was"; // Problem
  //String theText = "The Lucene was made by Doug Cutting and the great Hadoop was"; // Works
okay

  String fieldName = "SOME_FIELD_NAME";

  SpanNearQuery spanNear = new SpanNearQuery(new SpanQuery[] {
    new SpanTermQuery(new Term(fieldName, "lucene")),
    new SpanTermQuery(new Term(fieldName, "doug")) }, 5, true);

  Query query = new SpanNearQuery(new SpanQuery[] { spanNear,
    new SpanTermQuery(new Term(fieldName, "hadoop")) }, 4, true);

  String expected = "The <B>Lucene</B> was made by <B>Doug</B> Cutting
and Lucene great <B>Hadoop</B> was";
  //String expected = "The <B>Lucene</B> was made by <B>Doug</B> Cutting
and the great <B>Hadoop</B> was";

  String observed = highlightField(query, fieldName, theText);
  System.out.println("Expected: \"" + expected + "\n" + "Observed: \"" + observed);

  assertEquals("Why is that second instance of the term \"Lucene\" highlighted?", expected,
observed);
}

Is this an issue that's arisen before?  I've been reading through the source to QueryScorer,
WeightedSpanTerm, WeightedSpanTermExtractor, Spans, and NearSpansOrdered, but haven't found
the solution yet.  Initially, I thought that the extractWeightedSpanTerms method in WeightedSpanTermExtractor
should be called on each clause of a SpanNearQuery or SpanOrQuery, but that didn't get me
too far.

Any suggestions are welcome.

Thanks.

  Mike

Mime
View raw message