lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dalius (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3608) Spellcecker: String index out of range: -1
Date Mon, 09 Jul 2012 16:33:34 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409617#comment-13409617
] 

dalius commented on SOLR-3608:
------------------------------

Hello, I am not using shingles. There is my fieldTypes:

{code}
    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.PatternTokenizerFactory" pattern="\.|,|;|\?|!|\s+" />
        <filter class="solr.ICUFoldingFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.PatternTokenizerFactory" pattern="\.|,|;|\?|!|\s+" />
        <filter class="solr.ICUFoldingFilterFactory"/>
      </analyzer>
    </fieldType>

    <fieldType name="xml" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.ICUFoldingFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.ICUFoldingFilterFactory"/>
      </analyzer>
    </fieldType>

    <fieldType name="xml_whitespace_token" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.ICUFoldingFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.ICUFoldingFilterFactory"/>
      </analyzer>
    </fieldType>
{code}

No special query converter was written. I just pass the query string "casa saja" to my request
handler.
{code}
?spellcheck=on&start=0&q=casa+saja&spellcheck.collate=true&rows=10&version=2
{code}

This is my request handler:
{code}
    <requestHandler name="search" class="solr.SearchHandler" default="true">
         <lst name="defaults">
           <str name="echoParams">all</str>
           <str name="defType">edismax</str>
           <str name="mm">2&lt;-25%</str>
           <str name="hl">true</str>
           <str name="hl.fl">
              search_dc_title
              search_title
              search_phrases
              search_definitions
              search_full_definitions
              search_examples
              search_works
              search_bibliographies
              search_theme
              search_date_of_birth
              search_date_of_death
              search_state_of_birth
              search_state_of_death
              search_place_of_birth
              search_place_of_death
              search_photographies
           </str>
           <str name="hl.fragsize">500</str>
           <str name="qf">
              search_dc_title^3
              search_title
              search_phrases
              search_definitions
              search_full_definitions
              search_examples
              search_works
              search_bibliographies
              search_theme
              search_date_of_birth
              search_date_of_death
              search_state_of_birth
              search_state_of_death
              search_place_of_birth
              search_place_of_death
              search_photographies
           </str>
           <int name="rows">10</int>
           <str name="sort">score desc, weight desc</str>
           <str name="spellcheck.onlyMorePopular">true</str>
           <str name="spellcheck.extendedResults">false</str>
           <str name="spellcheck.count">1</str>
         </lst>
        <arr name="last-components">
          <str>spellcheck</str>
        </arr>
    </requestHandler>
{code}

All these fields are either text_general, xml or xml_whitespace_token type.
                
> Spellcecker: String index out of range: -1
> ------------------------------------------
>
>                 Key: SOLR-3608
>                 URL: https://issues.apache.org/jira/browse/SOLR-3608
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.6
>         Environment: Ubuntu 11.10 x64
> java version "1.7.0_05"
> Java(TM) SE Runtime Environment (build 1.7.0_05-b05)
> Java HotSpot(TM) 64-Bit Server VM (build 23.1-b03, mixed mode)
>            Reporter: dalius
>            Priority: Blocker
>
> Spell check component throws StringIndexOutOfBoundsException on multiterm search.
> Stack trace: 
> {code}
> SEVERE: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> 	at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789)
> 	at java.lang.StringBuilder.replace(StringBuilder.java:266)
> 	at org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:128)
> 	at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:69)
> 	at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:179)
> 	at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:156)
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
> ...
> {code}
> I have dug some debug info at org.apache.solr.spelling.SpellCheckCollator:69
> {code}
>       String collationQueryStr = getCollation(originalQuery, possibility.getCorrections());
> {code}
> originalQuery is "casa saja"
> possibility is "rank=0     casa>cal (-1)     saja>sala (-1)     casa saja>casa
de (-1)"
> The replace function fails on 3rd correction "casa saja>casa de (-1)". I hope this
makes any sense.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message