lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Vigdor (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments
Date Wed, 19 Aug 2009 02:21:14 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744830#action_12744830
] 

Alex Vigdor edited comment on LUCENE-1824 at 8/18/09 7:19 PM:
--------------------------------------------------------------

Actually a couple of the existing tests specifically check for the faulty behavior - the attached
patch for SimpleFragmentsBuilderTest tests for the non-truncating behavior implemented in
the patch.  For example, where the prior test looked for "ssing <b>speed</b>",
it now looks for " processing <b>speed</b>".


      was (Author: alexvigdor):
    Actually a couple of the existing tests specifically check for the faulty behavior - the
following modification of SimpleFragmentsBuilderTest tests for the non-truncating behavior
implemented in the patch.  A couple other tests in this file fail now (with the strings of
"a b b a" etc.), but they don't seem serious to me (i.e. I would think the tests could be
changed to test for the results they get from the patch).

Index: contrib/fast-vector-highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java
===================================================================
--- contrib/fast-vector-highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java
(revision 805400)
+++ contrib/fast-vector-highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java
(working copy)
@@ -90,7 +90,7 @@
     SimpleFragListBuilder sflb = new SimpleFragListBuilder();
     FieldFragList ffl = sflb.createFieldFragList( fpl, 100 );
     SimpleFragmentsBuilder sfb = new SimpleFragmentsBuilder();
-    assertEquals( " most <b>search engines</b> use only one of these methods.
Even the <b>search engines</b> that says they can use t",
+    assertEquals( " most <b>search engines</b> use only one of these methods.
Even the <b>search engines</b> that says they can use the ",
         sfb.createFragment( reader, 0, F, ffl ) );
   }
 
@@ -103,7 +103,7 @@
     SimpleFragListBuilder sflb = new SimpleFragListBuilder();
     FieldFragList ffl = sflb.createFieldFragList( fpl, 100 );
     SimpleFragmentsBuilder sfb = new SimpleFragmentsBuilder();
-    assertEquals( "ssing <b>speed</b>, the", sfb.createFragment( reader, 0, F,
ffl ) );
+    assertEquals( " processing <b>speed</b>, the", sfb.createFragment( reader,
0, F, ffl ) );
   }
   
   public void testUnstoredField() throws Exception {

  
> FastVectorHighlighter truncates words at beginning and end of fragments
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1824
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>         Environment: any
>            Reporter: Alex Vigdor
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1824-test.patch, LUCENE-1824.patch
>
>
> FastVectorHighlighter does not take word boundaries into consideration when building
fragments, so that in most cases the first and last word of a fragment are truncated.  This
makes the highlights less legible than they should be.  I will attach a patch to BaseFragmentBuilder
that resolves this by expanding the start and end boundaries of the fragment to the first
whitespace character on either side of the fragment, or the beginning or end of the source
text, whichever comes first.  This significantly improves legibility, at the cost of returning
a slightly larger number of characters than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message