lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Sekiguchi (Commented) (JIRA)" <>
Subject [jira] [Commented] (SOLR-3110) Search result comes up with truncated words at the start of highlighted fragment
Date Mon, 20 Feb 2012 08:00:37 GMT


Koji Sekiguchi commented on SOLR-3110:

Hi Shyam,

>From the mail thread:

Thanks for the response when I use".!?" and I see improvements,
below is the highlighted value

"The synthesis tool only supports the resolution functions for <em>std_logic</em>
and std_logic_vector."

But in other cases I also see that some of the words break in between as shown below

Original text: " How Are Clock Gating Checks Inferred"

When searching for the term "clock" the highlighted text is displayed as show below

"w Are <em>Clock</em> Gating Checks Inferred"

As you can see only w is displayed from the word How.

I couldn't reproduce your problem. I'm using trunk. I got the following snippet that was I
expected one:

<lst name="highlighting">
  <lst name="2">
    <arr name="includes">
      <str> How Are <em>Clock</em> Gating Checks Inferred</str>

My BoundaryScanner setting is:

<boundaryScanner name="default"
  <lst name="defaults">
    <str name="">100</str>
    <str name="">.!?</str>

My request was:


I'm using the following sample data that's been provided by Shyam in the mail thread:

    <field name="id">1</field>
    <field name="includes">User-defined resolution functions. The synthesis tool only
supports the</field>
    <field name="includes">resolution functions for std_logic and std_logic_vector.</field>
    <field name="includes"></field>
    <field name="includes">Slices with range indices that do not evaluate to constants</field>
    <field name="id">2</field>
    <field name="includes"> How Are Clock Gating Checks Inferred</field>

where includes field, I changed the field to multiValued in example schema.xml.

Can you verify it?

> Search result comes up with truncated words at the start of highlighted fragment
> --------------------------------------------------------------------------------
>                 Key: SOLR-3110
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: highlighter
>    Affects Versions: 4.0
>         Environment: java Tomcat Solaris
>            Reporter: Shyam Bhaskaran
>              Labels: FastVectorHighlighter, boundaryScanner, highlighting, solr
> It is being observed that words are getting truncated at the start of Highlighter fragment
> Following boundary scanner settings are introduced inside in the solrconfig.xml file
> <str name="">.,!?  &\#9;&\#10;&\#13;</str>  
> If I change the settings to 
> <str name="">.,!?</str>
> then it is seen that this issue goes away but another issues comes up where the highlighted
search fragment does not start from the beginning of the sentence.
> Below is the complete list of setting we are using for boundary scanner.
>    <boundaryScanner name="simple" class="solr.highlight.SimpleBoundaryScanner" default="true">
>      <lst name="defaults">
>        <str name="">200</str>
>        <str name="">.,!? &\#9;&\#10;&\#13;</str>
>      </lst>
>    </boundaryScanner>
>    <boundaryScanner name="breakIterator" class="solr.highlight.BreakIteratorBoundaryScanner">
>      <lst name="defaults">
>        <str name="">SENTENCE</str>
>        <str name="">en</str>
>        <str name="">US</str>
>      </lst>
>    </boundaryScanner>

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message