lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alon Muchnick <a...@datonics.com>
Subject Re: Lucene 4.0 WhitespaceAnalyzer problem
Date Tue, 15 Jan 2013 11:48:09 GMT
hi Maxim ,

you need to reset the tokenStream before the while loop - tokenStream .reset
()

check out
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/package-summary.html

 look under "invoking the analyzer" :

"ts.reset(); // Resets this stream to the beginning. (Required)"


Alon


On Tue, Jan 15, 2013 at 1:28 PM, Maksym Krasovskiy <makr@ciklum.com> wrote:

> Hi!
> I try to use WhitespaceAnalyzer from Lucene 4.0  for splitting strings to
> words.
> I wrote smal test:
> @Test
> public void whitespaceAnalyzerTest() throws IOException {
>     String string = "sdfdsf sdfsdf sd sdf ";
>     Analyzer wa = new WhitespaceAnalyzer(Version.LUCENE_40);
>     TokenStream tokenStream = wa.tokenStream("", new StringReader(string));
>     while (tokenStream.incrementToken()) {
>
> System.out.println(tokenStream.getAttribute(CharTermAttribute.class).toString());
>     }
> }
>
> but got exception:
> java.lang.ArrayIndexOutOfBoundsException: -1
>     at java.lang.Character.codePointAtImpl(Character.java:2405)
>     at java.lang.Character.codePointAt(Character.java:2369)
>     at
> org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePointAt(CharacterUtils.java:164)
>     at
> org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:166)
>     at
> com.maxx.tests.lucene40test.analyzer.AnalyzerTest.whitespaceAnalyzerTest(AnalyzerTest.java:93)
>     ...
>
>
> If I change WhitespaceAnalyzer to StandardAnalyzer  it work correctly.
> For workaround I can create StandardAnalyzer  without stopwords, but why
> my code doesn’t work?
>
>
>
> --
> Krasovskiy Maxim
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message