lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anders Melchiorsen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1404) Random failures with highlighting
Date Mon, 07 Sep 2009 08:33:57 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752044#action_12752044
] 

Anders Melchiorsen commented on SOLR-1404:
------------------------------------------

Hi Igor, thanks for the patch.

It does seem to work for me. I will leave it for others to decide whether it is the best fix.
If the issue is not fixed at a lower layer, note that the HTMLStripStandardTokenizerFactory
seems to have a similar problem.

I reported that this problem exists with other tokenizers as well, including the HTMLStripCharFilterFactory+WhitespaceTokenizerFactory
combo that you recommend. Today, however, I cannot reproduce that behaviour. As I have been
reporting several issues, I find it likely that I have been confused by having multiple configurations
running at the same time.


> Random failures with highlighting
> ---------------------------------
>
>                 Key: SOLR-1404
>                 URL: https://issues.apache.org/jira/browse/SOLR-1404
>             Project: Solr
>          Issue Type: Bug
>          Components: Analysis, highlighter
>    Affects Versions: 1.4
>            Reporter: Anders Melchiorsen
>             Fix For: 1.4
>
>         Attachments: SOLR-1404.patch
>
>
> With a recent Solr nightly, we started getting errors when highlighting.
> I have not been able to reduce our real setup to a minimal one that is failing, but the
same error seems to pop up with the configuration below. Note that the QUERY will mostly fail,
but it will work sometimes. Notably, after running "java -jar start.jar", the QUERY will work
the first time, but then start failing for a while. Seems that something is not being reset
properly.
> The example uses the deprecated HTMLStripWhitespaceTokenizerFactory but the problem apparently
also exists with other tokenizers; I was just unable to create a minimal example with other
configurations.
> SCHEMA
> <?xml version="1.0" encoding="UTF-8" ?>
> <schema name="example" version="1.2">
>   <types>
>     <fieldType name="string" class="solr.StrField" />
>     <fieldtype name="testtype" class="solr.TextField">
>       <analyzer>
>         <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory" />
>       </analyzer>
>     </fieldtype>
>  </types>
>  <fields>
>    <field name="id" type="string" indexed="true" stored="false" />
>    <field name="test" type="testtype" indexed="false" stored="true" />
>  </fields>
>  <uniqueKey>id</uniqueKey>
> </schema>
> INDEX
> URL=http://localhost:8983/solr/update
> curl $URL --data-binary '<add><doc><field name="id">1</field><field
name="test">test</field></doc></add>' -H 'Content-type:text/xml; charset=utf-8'
> curl $URL --data-binary '<commit/>' -H 'Content-type:text/xml; charset=utf-8'
> QUERY
> curl 'http://localhost:8983/solr/select/?hl.fl=test&hl=true&q=id:1'
> ERROR
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test exceeds length
of provided text sized 4
> org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException:
Token test exceeds length of provided text sized 4
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:328)
> 	at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
> 	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> 	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
> 	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> 	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> 	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> 	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> 	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> 	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
> 	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> 	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> 	at org.mortbay.jetty.Server.handle(Server.java:285)
> 	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> 	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
> 	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
> 	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> 	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> 	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
> 	at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
> Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token test
exceeds length of provided text sized 4
> 	at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254)
> 	at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:321)
> 	... 23 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message