lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nawab Zada Asad Iqbal <khi...@gmail.com>
Subject Re: solr 7.0: possible analysis error: startOffset must be non-negative
Date Wed, 27 Sep 2017 23:11:11 GMT
so, it seems like two steps for WordDelimiterGraphFilterFactory (with
different config in each step) were causing the error. I am still not sure
how it ended up in this state and if there is any benefit of having two
lines. But removing one of them fixed my error.


Thanks
Nawab

On Wed, Sep 27, 2017 at 3:12 PM, Nawab Zada Asad Iqbal <khichi@gmail.com>
wrote:

> Hi,
>
> I upgraded to solr 7 today and i am seeing tonnes of following errors for
> various fields.
>
> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException:
> Exception writing document id file_38810000549 to the index; possible
> analysis error: startOffset must be non-negative, and endOffset must be >=
> startOffset, and offsets must not go backwards startOffset=6,endOffset=8,lastStartOffset=9
> for field 'name_combined'
>
> We don't have a lot of custom code for analysis at indexing time, so my
> suspicion is on the schema definition, can someone suggest how should I
> start debugging this?
>
>     <field name="file_content_en"  type="text_stemming_en" indexed="true"
> stored="true" omitPositions="false"/>
>       <analyzer type="index">
>         <charFilter class="org.apache.lucene.analysis.icu.
> ICUNormalizer2CharFilterFactory" name="nfkc" mode="compose"/>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" preserveOriginal="1"
> splitOnCaseChange="0" splitOnNumerics="0" stemEnglishPossessive="1"/>
>         <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" preserveOriginal="1"
> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1"/>
>         <filter class="solr.PatternReplaceFilterFactory"
> pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>         <filter class="solr.SnowballPorterFilterFactory" />
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>         <filter class="solr.LimitTokenCountFilterFactory"
> maxTokenCount="10000" consumeAllTokens="false"/>
>         <filter class="solr.LengthFilterFactory" min="1" max="255"/>
>       </analyzer>
>
>
>     <field name="name_combined" type="text_ngram" indexed="true"
> stored="false" multiValued="true" omitPositions="true"/>
>       <analyzer type="index">
>         <charFilter class="org.apache.lucene.analysis.icu.
> ICUNormalizer2CharFilterFactory" name="nfkc" mode="compose"/>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" preserveOriginal="1"
> splitOnCaseChange="0" splitOnNumerics="0" stemEnglishPossessive="1"/>
>         <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" preserveOriginal="1"
> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1"/>
>         <filter class="solr.PatternReplaceFilterFactory"
> pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>         <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> maxGramSize="255"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>         <filter class="solr.LimitTokenCountFilterFactory"
> maxTokenCount="10000" consumeAllTokens="false"/>
>         <filter class="solr.LengthFilterFactory" min="1" max="255"/>
>       </analyzer>
>
>
> Thanks
> nawab
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message