lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Dyer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-4115) WordBreakSpellChecker throws ArrayIndexOutOfBoundsException for random query string
Date Fri, 30 Nov 2012 18:31:58 GMT

    [ https://issues.apache.org/jira/browse/SOLR-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507530#comment-13507530
] 

James Dyer commented on SOLR-4115:
----------------------------------

Correct me if I'm wrong, but Lucene is only able to take valid UTF-8 as input, right?  So
oal.util.UnicodeUtil.UTF8toUTF16 doesn't like \uD864\uDC79 because its invalid UTF-8.  

See http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/util/UnicodeUtil.html#UTF8toUTF16%28org.apache.lucene.util.BytesRef,%20org.apache.lucene.util.CharsRef%29
                
> WordBreakSpellChecker throws ArrayIndexOutOfBoundsException for random query string
> -----------------------------------------------------------------------------------
>
>                 Key: SOLR-4115
>                 URL: https://issues.apache.org/jira/browse/SOLR-4115
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 4.0
>         Environment: java version "1.6.0_37"
> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
>            Reporter: Andreas Hubold
>
> The following SolrJ test code causes an ArrayIndexOutOfBoundsException in the WordBreakSpellChecker.
I tested this with the Solr 4.0.0 example webapp started with {{java -jar start.jar}}.
> {code:java}
>   @Test
>   public void testWordbreakSpellchecker() throws Exception {
>     SolrQuery q = new SolrQuery("\uD864\uDC79");
>     q.setRequestHandler("/browse");
>     q.setParam("spellcheck.dictionary", "wordbreak");
>     HttpSolrServer server = new HttpSolrServer("http://localhost:8983/solr");
>     server.query(q, SolrRequest.METHOD.POST);
>   }
> {code}
> {noformat}
> INFO: [collection1] webapp=/solr path=/browse params={spellcheck.dictionary=wordbreak&qt=/browse&wt=javabin&q=?&version=2}
hits=0 status=500 QTime=11 
> Nov 28, 2012 11:23:01 AM org.apache.solr.common.SolrException log
> SEVERE: null:java.lang.ArrayIndexOutOfBoundsException: 1
> 	at org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:599)
> 	at org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:165)
> 	at org.apache.lucene.index.Term.text(Term.java:72)
> 	at org.apache.lucene.search.spell.WordBreakSpellChecker.generateSuggestWord(WordBreakSpellChecker.java:350)
> 	at org.apache.lucene.search.spell.WordBreakSpellChecker.generateBreakUpSuggestions(WordBreakSpellChecker.java:283)
> 	at org.apache.lucene.search.spell.WordBreakSpellChecker.suggestWordBreaks(WordBreakSpellChecker.java:122)
> 	at org.apache.solr.spelling.WordBreakSolrSpellChecker.getSuggestions(WordBreakSolrSpellChecker.java:229)
> 	at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:172)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
> 	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
> 	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> 	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> 	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> 	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
> 	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> 	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> 	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
> 	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> 	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> 	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
> 	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
> 	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
> 	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
> 	at org.eclipse.jetty.server.Server.handle(Server.java:351)
> 	at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
> 	at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
> 	at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
> 	at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
> 	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
> 	at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> 	at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
> 	at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
> 	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
> 	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
> 	at java.lang.Thread.run(Thread.java:662)
> {noformat}
> The query string is a random one (we found it in a randomized test). Other random strings
work.
> There are no problems with this query string when the DirectSolrSpellChecker is used
or during search.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message