lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Igal @ getRailo.org" <i...@getrailo.org>
Subject Re: tokenizer's tokens
Date Thu, 01 Nov 2012 23:50:11 GMT
thank you :)


On 11/1/2012 4:45 PM, Robert Muir wrote:
> this is intentional (since you have a bug in your code).
>
> you need to call reset(): see the tokenstream contract, step 2:
> http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/TokenStream.html
>
> On Thu, Nov 1, 2012 at 7:31 PM, Igal @ getRailo.org <igal@getrailo.org> wrote:
>> I'm trying to write a very simple method to show the different tokens that
>> come out of a tokenizer.  when I call WhitespaceTokenizer's (or
>> LetterTokenizer's) incrementToken() method though I get an
>> ArrayIndexOutOfBoundsException (see below)
>>
>> any ideas?
>>
>> p.s.  if I use StandardTokenizer it works.
>>
>>
>> java.lang.ArrayIndexOutOfBoundsException: -1
>>      at java.lang.Character.codePointAtImpl(Character.java:4739)
>>      at java.lang.Character.codePointAt(Character.java:4702)
>>      at
>> org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePointAt(CharacterUtils.java:164)
>>      at
>> org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:166)
>>      at test.Test1.tokenize(Test1.java:46)
>>      at test.Test1.main(Test1.java:139)
>>
>>
>> class Test1 {
>>
>>      static Version v = Version.LUCENE_40;
>>
>>
>>      static void tokenize( String s ) throws IOException {
>>
>>          Reader r = new StringReader( s );
>>
>>          Tokenizer t = new WhitespaceTokenizer( v, r );
>>
>>          CharTermAttribute   attrTerm = t.getAttribute(
>> CharTermAttribute.class );
>>
>>          while ( t.incrementToken() ) {
>>
>>              String term = attrTerm.toString();
>>
>>              System.out.println( term );
>>          }
>>      }
>>
>>
>>      public static void main( String[] args ) throws IOException {
>>
>>          String[] text = {
>>
>>              "The quick brown fox jumps over the lazy dog",
>>              "Only the fool would take trouble to verify that his sentence
>> was composed of ten a's, three b's, four c's, four d's, forty-six e's,
>> sixteen f's, four g's, thirteen h's, fifteen i's, two k's, nine l's, four
>> m's, twenty-five n's, twenty-four o's, five p's, sixteen r's, forty-one s's,
>> thirty-seven t's, ten u's, eight v's, eight w's, four x's, eleven y's,
>> twenty-seven commas, twenty-three apostrophes, seven hyphens and, last but
>> not least, a single!",
>>
>>          };
>>
>>          for ( String s : text )
>>              tokenize( s );
>>
>>      }
>>
>> }
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message