lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dalius (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-3608) Spellcecker: String index out of range: -1
Date Tue, 10 Jul 2012 09:48:34 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410184#comment-13410184
] 

dalius edited comment on SOLR-3608 at 7/10/12 9:46 AM:
-------------------------------------------------------

My bad. I told that there are no query converter, but actually there is one...
{code}
public class MultiTermQueryConverter extends SpellingQueryConverter {
    private static Joiner space = Joiner.on(' ');
    
    @Override
    public Collection<Token> convert(String original) {
        if (original == null) { // this can happen with q.alt = and no query
            return Collections.emptyList();
        }
        Collection<Token> convert = super.convert(original);
        if(convert.size() > 1){
            String joined = space.join(convert);
            int min = 100, max = 0;
            for(Token t : convert){
               min = Math.min(min, t.startOffset());
               max = Math.max(max, t.endOffset());
            }
            convert.add(new Token(joined, min, max));
        }
        return convert;
    }
}
{code}

{code}
  <searchComponent class="solr.SpellCheckComponent" name="suggest">
    <lst name="spellchecker">
      <str name="name">suggest</str>
      <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
      <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
      <!-- Alternatives to lookupImpl: 
           org.apache.solr.spelling.suggest.fst.FSTLookup   [finite state automaton]
           org.apache.solr.spelling.suggest.jaspell.JaspellLookup [default, jaspell-based]
           org.apache.solr.spelling.suggest.tst.TSTLookup   [ternary trees]
      -->
      <str name="field">suggest</str>  <!-- the indexed field to derive suggestions
from -->
      <float name="threshold">0.00001</float>
      <str name="buildOnOptimize">true</str>
      <str name="buildOnCommit">false</str>
    <!--   <str name="sourceLocation">spellings.txt</str> -->
    </lst>
  </searchComponent>
{code}

It adds additional token that is a join of all tokens separating with space. Shouldn't it
just ignore the token that can not be replaced instead?

Sorry for that.
                
      was (Author: dalius_semantico):
    My bad. I told that there are no query converter, but actually there is one...
{code}
public class MultiTermQueryConverter extends SpellingQueryConverter {
    private static Joiner space = Joiner.on(' ');
    
    @Override
    public Collection<Token> convert(String original) {
        if (original == null) { // this can happen with q.alt = and no query
            return Collections.emptyList();
        }
        Collection<Token> convert = super.convert(original);
        if(convert.size() > 1){
            String joined = space.join(convert);
            int min = 100, max = 0;
            for(Token t : convert){
               min = Math.min(min, t.startOffset());
               max = Math.max(max, t.endOffset());
            }
            convert.add(new Token(joined, min, max));
        }
        return convert;
    }
}
{code}

{code}
  <searchComponent class="solr.SpellCheckComponent" name="suggest">
    <lst name="spellchecker">
      <str name="name">suggest</str>
      <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
      <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
      <!-- Alternatives to lookupImpl: 
           org.apache.solr.spelling.suggest.fst.FSTLookup   [finite state automaton]
           org.apache.solr.spelling.suggest.jaspell.JaspellLookup [default, jaspell-based]
           org.apache.solr.spelling.suggest.tst.TSTLookup   [ternary trees]
      -->
      <str name="field">suggest</str>  <!-- the indexed field to derive suggestions
from -->
      <float name="threshold">0.00001</float>
      <str name="buildOnOptimize">true</str>
      <str name="buildOnCommit">false</str>
    <!--   <str name="sourceLocation">spellings.txt</str> -->
    </lst>
  </searchComponent>
{code}

it adds additional token that is a join of all tokens separating with space. Shouldn't it
just ignore the token that can not be replaces instead?

Sorry for that.
                  
> Spellcecker: String index out of range: -1
> ------------------------------------------
>
>                 Key: SOLR-3608
>                 URL: https://issues.apache.org/jira/browse/SOLR-3608
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.6
>         Environment: Ubuntu 11.10 x64
> java version "1.7.0_05"
> Java(TM) SE Runtime Environment (build 1.7.0_05-b05)
> Java HotSpot(TM) 64-Bit Server VM (build 23.1-b03, mixed mode)
>            Reporter: dalius
>            Priority: Blocker
>
> Spell check component throws StringIndexOutOfBoundsException on multiterm search.
> Stack trace: 
> {code}
> SEVERE: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> 	at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789)
> 	at java.lang.StringBuilder.replace(StringBuilder.java:266)
> 	at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:128)
> 	at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:69)
> 	at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:179)
> 	at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:156)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
> ...
> {code}
> I have dug some debug info at org.apache.solr.spelling.SpellCheckCollator:69
> {code}
>       String collationQueryStr = getCollation(originalQuery, possibility.getCorrections());
> {code}
> originalQuery is "casa saja"
> possibility is "rank=0     casa>cal (-1)     saja>sala (-1)     casa saja>casa
de (-1)"
> The replace function fails on 3rd correction "casa saja>casa de (-1)". I hope this
makes any sense.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message