lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Rowe (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2587) Highlighter picks wrong offset for fragment boundaries
Date Thu, 03 Nov 2011 21:05:32 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143528#comment-13143528
] 

Steven Rowe commented on LUCENE-2587:
-------------------------------------

Hi Terje,

Can you upload your IMSentenceFragmenter.java file again, but this time click on the radio
button next to "Grant license to ASF for inclusion in ASF works (as per the Apache License
ยง5)"?

Thanks,
Steve
                
> Highlighter picks wrong offset for fragment boundaries
> ------------------------------------------------------
>
>                 Key: LUCENE-2587
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2587
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/highlighter
>    Affects Versions: 3.0.2
>         Environment: Java 6 + Lucene 3.0.2
>            Reporter: Terje Eggestad
>            Priority: Trivial
>              Labels: newdev
>         Attachments: IMSentenceFragmenter.java, LUCENE-2587.patch
>
>
> I have written  a new Fragmenter since we need fragments for hitlines to be on sentence
boundaries and not cross paragraphs. 
> When using it with org.apache.lucene.search.highlight.Highlighter, I get hitlines that
starts with ". ", "? ", "! "...
> Consider the text  "A b c d e. F g h i j! K l m n o. " 
> which become the tokenstream : (A) (b) (c) (d) (e) (F) (g) (h) (i) (j) (K) (l) (m) (n)
(o)  
> If the fragmenter return isNewFragment()  = true on F and K and Highlighter pick the
middle fragment, lets say we search on "g" the hitline becomes:
> ". F <B>g</B> h i j"
> The reason, it seems, is that the offset to the fragment boundaries found by taking the
endOffset of the last token in a fragment , 
> not the startOffset of the first. 
> TJ

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message