lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Osullivan L. <L.Osulli...@swansea.ac.uk>
Subject RE: charFilter
Date Thu, 13 Sep 2012 10:43:14 GMT
Hi Folks,

I'm getting the following error after using a custom filter:

SEVERE: org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException:
Token PR  2823.000000 A0.200000 S0.819880 exceeds length of provided text sized 15

As the error suggests, the input value is PR2823.A2S81988 (15 chars). I have been informed
that correctOffset() method of the CharFilter class can be used to resolve this issue but
as far as I can tell, all that does is return the value - it doesn't set it. 

I have included some details below.

Kind Regards,

Luke

In my schema I have:

    <fieldType name="LCNormalized" class="solr.TextField" sortMissingLast="true" omitNorms="true">
        <analyzer>
          <charFilter class="com.test.solr.analysis.LukesTestCharFilterFactory"/>
          <tokenizer class="solr.KeywordTokenizerFactory"/>
        </analyzer>
    </fieldType>

and the method is:

public class LukesTestCharFilterFactory extends BaseCharFilterFactory {

	public CharStream create(CharStream input) {
		return new LukesTestCharFilter(input);
	}
}

public final class LukesTestCharFilter extends BaseCharFilter
{
 ...
  public LukesTestCharFilter(CharStream input)  {
	  super(input);
	  try {
          // Load the whole input into a string
          StringBuilder sb = new StringBuilder();
          char[] buf = new char[1024];

          int len;
          while ((len = input.read(buf)) >= 0) {
              sb.append(buf, 0, len);
          }

          String original = sb.toString();
          String modified = getLCShelfkey(original);
          CharStream result = CharReader.get(new StringReader(modified));

          this.input = result;
          this.input.correctOffset(modified.length());
      } catch (IOException e) {
          System.err.println("There was a problem parsing input.  Skipping.");
      }
  }
 ...
}

Mime
View raw message