lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wolfgang Hoschek <>
Subject contrib: keywordTokenStream
Date Tue, 03 May 2005 20:26:33 GMT
Here's a convenience add-on method to MemoryIndex. If it turns out that 
this could be of wider use, it could be moved into the core analysis 
package. For the moment the MemoryIndex might be a better home. 
Opinions, anyone?


	 * Convenience method; Creates and returns a token stream that 
generates a
	 * token for each keyword in the given collection, "as is", without any
	 * transforming text analysis. The resulting token stream can be fed 
	 * {@link #addField(String, TokenStream)}, perhaps wrapped into another
	 * {@link org.apache.lucene.analysis.TokenFilter}, as desired.
	 * @param keywords
	 *            the keywords to generate tokens for
	 * @return the corresponding token stream
	public TokenStream keywordTokenStream(final Collection keywords) {
		if (keywords == null)
			throw new IllegalArgumentException("keywords must not be null");
		return new TokenStream() {
			Iterator iter = keywords.iterator();
			int pos = 0;
			int start = 0;
			public Token next() {
				if (!iter.hasNext()) return null;
				Object obj =;
				if (obj == null)
					throw new IllegalArgumentException("keyword must not be null");
				String term = obj.toString();
				Token token = new Token(term, start, start + term.length());
				start += term.length() + 1; // separate words by 1 (blank) character
				return token;

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message