lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Goddard (JIRA)" <>
Subject [jira] Commented: (LUCENE-2287) Unexpected terms are highlighted within nested SpanQuery instances
Date Mon, 01 Mar 2010 17:04:05 GMT


Michael Goddard commented on LUCENE-2287:

The backward compatibility break was adding

  public abstract Spans[] getSubSpans();

to the Spans class.  I had to do this to enable the recursion on Spans and figured it was
the way to go since NearSpansUnordered and NearSpansOrdered had this method.

> Unexpected terms are highlighted within nested SpanQuery instances
> ------------------------------------------------------------------
>                 Key: LUCENE-2287
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/highlighter
>    Affects Versions: 2.9.1
>         Environment: Linux, Solaris, Windows
>            Reporter: Michael Goddard
>            Priority: Minor
>         Attachments: LUCENE-2287.patch, LUCENE-2287.patch, LUCENE-2287.patch, LUCENE-2287.patch,
>   Original Estimate: 336h
>  Remaining Estimate: 336h
> I haven't yet been able to resolve why I'm seeing spurious highlighting in nested SpanQuery
instances.  Briefly, the issue is illustrated by the second instance of "Lucene" being highlighted
in the test below, when it doesn't satisfy the inner span.  There's been some discussion about
this on the java-dev list, and I'm opening this issue now because I have made some initial
progress on this.
> This new test, added to the  HighlighterTest class in lucene_2_9_1, illustrates this:
> /*
>  * Ref:
>  */
> public void testHighlightingNestedSpans2() throws Exception {
>   String theText = "The Lucene was made by Doug Cutting and Lucene great Hadoop was";
// Problem
>   //String theText = "The Lucene was made by Doug Cutting and the great Hadoop was";
// Works okay
>   String fieldName = "SOME_FIELD_NAME";
>   SpanNearQuery spanNear = new SpanNearQuery(new SpanQuery[] {
>     new SpanTermQuery(new Term(fieldName, "lucene")),
>     new SpanTermQuery(new Term(fieldName, "doug")) }, 5, true);
>   Query query = new SpanNearQuery(new SpanQuery[] { spanNear,
>     new SpanTermQuery(new Term(fieldName, "hadoop")) }, 4, true);
>   String expected = "The <B>Lucene</B> was made by <B>Doug</B>
Cutting and Lucene great <B>Hadoop</B> was";
>   //String expected = "The <B>Lucene</B> was made by <B>Doug</B>
Cutting and the great <B>Hadoop</B> was";
>   String observed = highlightField(query, fieldName, theText);
>   System.out.println("Expected: \"" + expected + "\n" + "Observed: \"" + observed);
>   assertEquals("Why is that second instance of the term \"Lucene\" highlighted?", expected,
> }
> Is this an issue that's arisen before?  I've been reading through the source to QueryScorer,
WeightedSpanTerm, WeightedSpanTermExtractor, Spans, and NearSpansOrdered, but haven't found
the solution yet.  Initially, I thought that the extractWeightedSpanTerms method in WeightedSpanTermExtractor
should be called on each clause of a SpanNearQuery or SpanOrQuery, but that didn't get me
too far.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message