lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Sekiguchi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments
Date Wed, 19 Aug 2009 04:34:14 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744872#action_12744872
] 

Koji Sekiguchi commented on LUCENE-1824:
----------------------------------------

Alex,
I don't have much time to look into this patch but I understand the requirement.
Why I named *Simple* FragmentsBuilder because it simply makes fragments without concern for
boundaries. I designed FragmentsBuilder can be pluggable, so that any other FragmentsBuilders
can be written/contributed, e.g. WhitespaceFragmentsBuilder, SentenceAwareFragmentsBuilder,
etc. I think adding new FragmentsBuilders (plus test cases) is better than modifying existing
FragmentsBuilders. Don't forget that some languages (CJK) don't use period or whitespace for
boundaries of words/sentences when you write new FragmentsBuilders.


> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1824-test.patch, LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when building
fragments, so that in most cases the first and last word of a fragment are truncated.  This
makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder
that resolves this by expanding the start and end boundaries of the fragment to the first
whitespace character on either side of the fragment, or the beginning or end of the source
text, whichever comes first.  This significantly improves legibility, at the cost of returning
a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message