lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wolfgang Hoschek <whosc...@lbl.gov>
Subject contrib: keywordTokenStream
Date Tue, 03 May 2005 20:26:33 GMT
Here's a convenience add-on method to MemoryIndex. If it turns out that 
this could be of wider use, it could be moved into the core analysis 
package. For the moment the MemoryIndex might be a better home. 
Opinions, anyone?

Wolfgang.

	/**
	 * Convenience method; Creates and returns a token stream that 
generates a
	 * token for each keyword in the given collection, "as is", without any
	 * transforming text analysis. The resulting token stream can be fed 
into
	 * {@link #addField(String, TokenStream)}, perhaps wrapped into another
	 * {@link org.apache.lucene.analysis.TokenFilter}, as desired.
	 *
	 * @param keywords
	 *            the keywords to generate tokens for
	 * @return the corresponding token stream
	 */
	public TokenStream keywordTokenStream(final Collection keywords) {
		if (keywords == null)
			throw new IllegalArgumentException("keywords must not be null");
		
		return new TokenStream() {
			Iterator iter = keywords.iterator();
			int pos = 0;
			int start = 0;
			public Token next() {
				if (!iter.hasNext()) return null;
				
				Object obj = iter.next();
				if (obj == null)
					throw new IllegalArgumentException("keyword must not be null");
				
				String term = obj.toString();
				Token token = new Token(term, start, start + term.length());
				start += term.length() + 1; // separate words by 1 (blank) character
				pos++;
				return token;
			}
		};
	}


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message