lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luke Forehand (JIRA)" <j...@apache.org>
Subject [jira] Created: (SOLR-1883) Highlighting failure caused by InvalidTokenOffsetsException
Date Tue, 13 Apr 2010 17:37:19 GMT
Highlighting failure caused by InvalidTokenOffsetsException
-----------------------------------------------------------

                 Key: SOLR-1883
                 URL: https://issues.apache.org/jira/browse/SOLR-1883
             Project: Solr
          Issue Type: Bug
          Components: highlighter
    Affects Versions: 1.4
         Environment: {code:title=java}
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
{code}
{code:title=solr lib manifest}
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.7.0
Created-By: 14.1-b02-90 (Apple Inc.)
Extension-Name: org.apache.solr
Specification-Title: Apache Solr Search Server
Specification-Version: 1.4.0
Specification-Vendor: The Apache Software Foundation
Implementation-Title: org.apache.solr
Implementation-Version: 1.4.0 833479 - grantingersoll - 2009-11-06 12:
 33:40
Implementation-Vendor: The Apache Software Foundation
X-Compile-Source-JDK: 1.5
X-Compile-Target-JDK: 1.5
{code}
{code:title=OS}
Linux myhost 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
{code}
            Reporter: Luke Forehand



This issue seems to be the same as a previous issue that was bulk closed in solr 1.4 https://issues.apache.org/jira/browse/SOLR-1404,
and I see someone reported this bug in lucene 2.9.1 https://issues.apache.org/jira/browse/LUCENE-2208
We are experiencing this issue as well.  

I have pasted the important part of our schema.xml and the solr exception.  I have also attached
the document that fails when queried as a highlight query.  The invalid token seems to be
'system' which is the very last token in the document field if you look at the attached file.

{code:title=schema.xml}
<?xml version="1.0" encoding="UTF-8"?>

<schema name="xxx" version="1.1">

	<types>

		<fieldType name="scrubbedText" class="solr.TextField" positionIncrementGap="100">
			<analyzer>
				<tokenizer class="solr.StandardTokenizerFactory" />
				<charFilter class="solr.HTMLStripCharFilterFactory" />
				<filter class="solr.StandardFilterFactory" />
				<filter class="solr.LowerCaseFilterFactory" />
				<filter class="solr.StopFilterFactory" />
			</analyzer>
		</fieldType>
		...
	</types>

	<fields>
		<field name="id" type="string" stored="true" indexed="true" />
		<field name="textScrubbed" type="scrubbedText" stored="true" indexed="true" />
		...
	</fields>

	<uniqueKey>id</uniqueKey>
	<defaultSearchField>textScrubbed</defaultSearchField>

</schema>
{code}

{code:title=solr.log exception}
Apr 13, 2010 3:08:35 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException:
Token system exceeds length of provided text sized 17063
        at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:342)
        at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
        at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
        at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
        at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:574)
        at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1527)
        at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token system exceeds
length of provided text sized 17063
        at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254)
        at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)
        ... 18 more
{code}

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message