lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lance Norskog (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3110) Search result comes up with truncated words at the start of highlighted fragment
Date Wed, 30 May 2012 04:01:24 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285370#comment-13285370
] 

Lance Norskog commented on SOLR-3110:
-------------------------------------

This appeared in the mail thread a month later:
{quote}

Hi Koji,
  I am Shyam's coworker. After some looking into this issue, I believe the
problem of chopped word has to do with
org.apache.lucene.search.vectorhighlight.SimpleFragListBuilder class'
'margin' field. It is set to 6 by default. My understanding is having margin
value of greater than zero results in truncated word when the highlighted
term is too close to beginning of a document. I was able to reset the
'margin' field by creating my custom version of
org.apache.solr.highlight.SimpleFragListBuilder and passing zero for
'margin' when calling the Lucene's SimpleFragListBuilder constructor. My
testing shows the problem has been fixed. Do you concur?

  Now couple of questions. Not sure what the purpose of this field is, could
you give the use case for it? Also could it be exposed as a parameter in
Solr so it could be set to some other value?

Thanks,

Koorosh
{quote}
                
> Search result comes up with truncated words at the start of highlighted fragment
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-3110
>                 URL: https://issues.apache.org/jira/browse/SOLR-3110
>             Project: Solr
>          Issue Type: Bug
>          Components: highlighter
>    Affects Versions: 4.0
>         Environment: java Tomcat Solaris
>            Reporter: Shyam Bhaskaran
>              Labels: FastVectorHighlighter, boundaryScanner, highlighting, solr
>
> It is being observed that words are getting truncated at the start of Highlighter fragment
displayed. 
> Following boundary scanner settings are introduced inside in the solrconfig.xml file
> <str name="hl.bs.chars">.,!?  &\#9;&\#10;&\#13;</str>  
> If I change the settings to 
> <str name="hl.bs.chars">.,!?</str>
> then it is seen that this issue goes away but another issues comes up where the highlighted
search fragment does not start from the beginning of the sentence.
> Below is the complete list of setting we are using for boundary scanner.
>    <boundaryScanner name="simple" class="solr.highlight.SimpleBoundaryScanner" default="true">
>      <lst name="defaults">
>        <str name="hl.bs.maxScan">200</str>
>        <str name="hl.bs.chars">.,!? &\#9;&\#10;&\#13;</str>
>      </lst>
>    </boundaryScanner>
>    <boundaryScanner name="breakIterator" class="solr.highlight.BreakIteratorBoundaryScanner">
>      <lst name="defaults">
>        <str name="hl.bs.type">SENTENCE</str>
>        <str name="hl.bs.language">en</str>
>        <str name="hl.bs.country">US</str>
>      </lst>
>    </boundaryScanner>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message