lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shaun Campbell <campbell.sh...@gmail.com>
Subject Highlighting Issue
Date Thu, 09 Dec 2010 12:22:17 GMT
I'm trying to highlight a field and I'm getting an exception thrown, only on
certain search terms though.  I am fairly certain that the cause of the
problem is through having synonyms on the highlighted field as I have had
highlighting working in the past on other fields.

The added complication is that the field that I am highlighting also has
ngramming and stemming.  I think what is happening is that the highlighting
cannot match the criteria (which happens to be a synonym) against the actual
string retrieved from the index and crashes, I think if the string found is
greater than a certain number of characters.

I wonder if anyone has experienced this problem and knows how to get around
it?

My field definition is:

    <!-- An edge nGrammed and stemmed field for the document tags.
-->
    <fieldType name="tagphrase_nGram" class="solr.TextField"
positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="../../common/tag_synonyms.txt" ignoreCase="true" expand="true"/>
        <!--<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />-->
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>

        <filter class="solr.EdgeNGramFilterFactory"
            minGramSize="1"
            maxGramSize="15"
            side="front" />
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
    </analyzer>
    </fieldType>

My query is:

sort=tagcount+desc&hl.snippets=1&start=0&q=(+%2Btagsearch:asset)+||+(+%2Btagsearchnostem:asset)+&hl.fl=tagsearch&wt=javabin&hl=true&rows=100&version=1

The exception being thrown is:

09-Dec-2010 11:59:26 org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException:
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token profo
exceeds length of provided text sized 26
    at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:342)
    at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
    at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
    at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
    at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
    at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
    at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
    at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
    at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
    at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
    at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
    at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
    at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
    at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
    at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
    at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
    at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException:
Token profo exceeds length of provided text sized 26
    at
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254)
    at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)
    ... 18 more

Thanks
Shaun

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message