lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Harwood (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-627) highlighter problems with overlapping tokens
Date Thu, 28 Sep 2006 19:17:51 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-627?page=comments#action_12438531 ] 
            
Mark Harwood commented on LUCENE-627:
-------------------------------------

Thanks for the test Kerang.

I no longer have a clear view as to what is expected behaviour here and whether this is a
test that needs to pass.

It seems to conflict with the expected results for Yonik's test method "testOverlapAnalyzer2".
In that test, (like yours) for a cluster of overlapping tokens with search terms identified
at the beginning and end, Yonik expects the whole cluster from search term 1's start offset
to search term 2's end offset to be surrounded by one highlight tag. Your test expected 2
tags.

Who is right?

This is a snippet from Yonik's test:
    query = new QueryParser("text",new WhitespaceAnalyzer()).parse("hi speed");
    highlighter = new Highlighter(new QueryScorer(query));
    result = highlighter.getBestFragments(getTS2(), s, 3, "...");
    assertEquals("<B>Hi-Speed</B>10 foo",result);

and yours:

      String srchkey = "BC FG"; 
      String expectedResult="A<B>BC</B>DE<B>FG</B>HIJ"; 

I don't really have an opinion either way so I'll turn it over to you

Cheers
Mark




> highlighter problems with overlapping tokens
> --------------------------------------------
>
>                 Key: LUCENE-627
>                 URL: http://issues.apache.org/jira/browse/LUCENE-627
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Other
>    Affects Versions: 2.0.1
>            Reporter: Yonik Seeley
>             Fix For: 2.0.1
>
>         Attachments: highlight_overlap.diff, Highlighter.java.diff
>
>
> The lucene highlighter has problems when tokens that overlap are generated.
> For example, if analysis of iPod generates the tokens "i", "pod", "ipod" (with pod and
ipod in the same position),
> then the highlighter will output this as iipod, regardless of if any of those tokens
are highlighted.
> Discovered via http://issues.apache.org/jira/browse/SOLR-24

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message