From Rebecca Watson <>
Subject Re: Stop words filter
Date Wed, 23 Jun 2010 03:20:36 GMT
i guess you are using lucene 2.9 or below if you're talking about
Tokens still...

here's some old code i used to use (not sure if i wrote it or grabbed it from
online examples - its been a while since i used it!)
that grabbed the set of tokens given field name +
text to analyse (for any class that extended it.... e.g. use it for
per field analyzer

public abstract class GenAnalyzer extends Analyzer {
	 * lucene Analyzer object
	 * @see org.apache.lucene.analysis.Analyzer
	protected Analyzer gan;
	 * A method to split text into tokens which are returned in the form of
	 * a TokenStream object. The text is read in using the
	 * object. As analysers can be field specific the name of the field
	 * is also provided to the method.
	 * @see org.apache.lucene.analysis.Analyzer#tokenStream(java.lang.String,
	 * @param fieldName the name of the lucene field
	 * @param reader A Reader object containing string to split into tokens
	 * @return a TokenStream that represents the string split into tokens
based on the _
	 * field name (maybe field specific analyser).
	public TokenStream tokenStream(String fieldName, Reader reader) {
		return gan.tokenStream(fieldName, reader);
	 * A method to split text into tokens which are returned in the form of
	 * a Token[]. The text is read in as a string.
	 * As analysers can be field specific the name of the field
	 * is also provided to the method.
	 * similar to tokenStream method accept that the parameters
	 * and return type differ.
	 * @param fieldName the name of the lucene field
	 * @param text the text to be split into tokens
	 * @return a Token[] which represents the split text tokens.
	 * @throws IOException maybe thrown by call.
	 * @see org.apache.lucene.analysis.Token
	public Token[] getTokens(String fieldName, String text)
	throws IOException {
		TokenStream stream = gan.tokenStream(fieldName, new StringReader(text));
		ArrayList<Token> tokenList = new ArrayList<Token>();
		Token token = new Token();
			token =;
			if (token == null) break;
			tokenList.add((Token) token.clone());
		return tokenList.toArray(new Token[0]);

hope that helps, i haven't used this code for a while but it worked
when i used it last!

in lucene 2.9 the method is deprecated... and
if you move to lucene 3 i think that's where the attributesources replace tokens
so all this code will need to be ported...

thanks :)


On 23 June 2010 10:49, Vinicius Carvalho <> wrote:
> Hello there! I've been using lucene as a Fult Text Search solution for some
> time. And  although I'm familiar with Analyzers and Stemmers I never used
> them directly.
> I'm testing a few experiments on Sentiment Analysis and our implementation
> needs to perform stemming and stop word removal. I thought using lucene
> built-in support to spare me some coding time.
> Is there any example? I'm trying
> TokenStream stream = analyzer.tokenStream("", new StringReader(inputStr));
> Problem is that I could not find a way to get the result tokens. I was
> expecting something like stream.getTokens:Token[] :P
> Could someone point me in the right direction?
> Regards
> --
> The intuitive mind is a sacred gift and the
> rational mind is a faithful servant. We have
> created a society that honors the servant and
> has forgotten the gift.

