lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Clawson <clawsonj...@yahoo.com.INVALID>
Subject Re: Lucene 5.5.0 StopFilter Error
Date Thu, 25 Feb 2016 22:02:10 GMT
Thanks for the quick response. 


I checked everything as you had pointed out and the following did the trick to the code working
for me:

> In your code you indirectly called reset twice on the Tokenizer. First direct and then
implicit > through the filter. 


I removed the tokenizer.reset() in the beginning and tokenizer.close() at the end. 



It works now. Thanks!

Jake Clawson



----- Original Message -----
From: Uwe Schindler <uwe@thetaphi.de>
To: java-user@lucene.apache.org
Sent: Thursday, February 25, 2016 4:53 PM
Subject: Re: Lucene 5.5.0 StopFilter Error

You must build the whole stream including all filters first and then consume it. So first
create Tokenizer, then wrap by filter. Once all this is done, you can consume the filter on
top using the workflow. You don't need the Tokenizer anymore (you can remove its reference).
The filter delegates everything downstream. Finally only close the filter not the Tokenizer.

In your code you indirectly called reset twice on the Tokenizer. First direct and then implicit
through the filter. 

Uwe


Am 25. Februar 2016 22:43:30 MEZ, schrieb Jake Clawson <clawsonjake@yahoo.com.INVALID>:
>I am trying to use StopFilter in Lucene 5.5.0. I tried the following:
>
>package lucenedemo;
>
>import java.io.StringReader;
>import java.util.ArrayList;
>import java.util.Arrays;
>import java.util.Collections;
>import java.util.HashSet;
>import java.util.List;
>import java.util.Set;
>import java.util.Iterator;
>
>import org.apache.lucene.*;
>import org.apache.lucene.analysis.*;
>import org.apache.lucene.analysis.standard.*;
>import org.apache.lucene.analysis.core.StopFilter;
>import org.apache.lucene.analysis.en.EnglishAnalyzer;
>import org.apache.lucene.analysis.standard.StandardAnalyzer;
>import org.apache.lucene.analysis.standard.StandardTokenizer;
>import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
>import org.apache.lucene.analysis.util.CharArraySet;
>import org.apache.lucene.util.AttributeFactory;
>import org.apache.lucene.util.Version;
>
>public class lucenedemo {
>
>public static void main(String[] args) throws Exception {
>System.out.println(removeStopWords("hello how are you? I am fine. This
>is a great day!"));
>
>}
>
>public static String removeStopWords(String strInput) throws Exception
>{
>AttributeFactory factory = AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY;
>StandardTokenizer tokenizer = new StandardTokenizer(factory);
>tokenizer.setReader(new StringReader(strInput));
>tokenizer.reset(); 
>CharArraySet stopWords = EnglishAnalyzer.getDefaultStopSet();
>
>TokenStream streamStop = new StopFilter(tokenizer, stopWords);
>StringBuilder sb = new StringBuilder();
>CharTermAttribute charTermAttribute =
>tokenizer.addAttribute(CharTermAttribute.class);
>streamStop.reset();
>while (streamStop.incrementToken()) {
>String term = charTermAttribute.toString();
>sb.append(term + " ");
>}
>
>streamStop.end();
>streamStop.close();
>
>tokenizer.close(); 
>
>
>return sb.toString();
>
>}
>
>}
>
>
>But it gives me the following error:
>
>Exception in thread "main" java.lang.IllegalStateException: TokenStream
>contract violation: reset()/close() call missing, reset() called
>multiple times, or subclass does not call super.reset(). Please see
>Javadocs of TokenStream class for more information about the correct
>consuming workflow.
>at org.apache.lucene.analysis.Tokenizer$1.read(Tokenizer.java:109)
>at
>org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:527)
>at
>org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:738)
>at
>org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:159)
>at
>org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:51)
>at lucenedemo.lucenedemo.removeStopWords(lucenedemo.java:42)
>at lucenedemo.lucenedemo.main(lucenedemo.java:27)
>
>What exactly am I doing wrong here? I have closed both the Tokenizer
>and TokenStream clasess. Is there something else I am missing here?
>
>Any help would be greatly appreciated.
>
>Thanks,
>Jake Clawson
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org

--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message