lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Dyer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3608) Spellcecker: String index out of range: -1
Date Tue, 10 Jul 2012 14:50:36 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410396#comment-13410396
] 

James Dyer commented on SOLR-3608:
----------------------------------

I'm not sure what your QueryConverter is supposed to do as I'm not at all familiar with how
you need to set up the spellchecker to use it for autosuggest (as it appears you're doing).
 You should probably re-post a summary of all this one the solr-user mailing list to get more
help.  (you might also want to see this overview:  http://stackoverflow.com/questions/10547438/solr-returns-only-one-collation-for-suggester-component)

My understanding is that the "collate" functionality was only designed to work with "normal"
query converters.  So if you throw shingled phrases at it from a custom query converter all
bets are off.  I also think when people use shingles like this it is because they are trying
to work around the limitations of "collate", and not use it at all.  But many of these limitations
have been removed, particularly with the addition of "maxCollationTries".  But see SOLR-3240,
which aims in improving the performance of "maxCollationTries" so that it would be more useful
in an autosuggest situation.

I think for the purposes of this JIRA issue, we need to make the spell check collator more
resilient when users throw funny things at it, like in this case.  At the least it shouldn't
throw an exception.  Maybe it could log a warning in some cases and others be more capable
and actually produce a good collation.  In the "casa saja" case, it could just throw out the
3rd replacement and go on with life.
                
> Spellcecker: String index out of range: -1
> ------------------------------------------
>
>                 Key: SOLR-3608
>                 URL: https://issues.apache.org/jira/browse/SOLR-3608
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.6
>         Environment: Ubuntu 11.10 x64
> java version "1.7.0_05"
> Java(TM) SE Runtime Environment (build 1.7.0_05-b05)
> Java HotSpot(TM) 64-Bit Server VM (build 23.1-b03, mixed mode)
>            Reporter: dalius
>            Priority: Blocker
>
> Spell check component throws StringIndexOutOfBoundsException on multiterm search.
> Stack trace: 
> {code}
> SEVERE: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> 	at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789)
> 	at java.lang.StringBuilder.replace(StringBuilder.java:266)
> 	at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:128)
> 	at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:69)
> 	at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:179)
> 	at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:156)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
> ...
> {code}
> I have dug some debug info at org.apache.solr.spelling.SpellCheckCollator:69
> {code}
>       String collationQueryStr = getCollation(originalQuery, possibility.getCorrections());
> {code}
> originalQuery is "casa saja"
> possibility is "rank=0     casa>cal (-1)     saja>sala (-1)     casa saja>casa
de (-1)"
> The replace function fails on 3rd correction "casa saja>casa de (-1)". I hope this
makes any sense.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message