lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: contrib: keywordTokenStream
Date Wed, 04 May 2005 00:26:16 GMT
Wolfgang,

I've now added this.  I'm not seeing how this could be generally  
useful.  I'm curious how you are using it and why it is better suited  
for what you're doing than any other analyzer.

"keyword tokenizer" is a bit overloaded terminology-wise, though -  
look in the contrib/analyzers/src/java area to see what I mean.

     Erik

On May 3, 2005, at 4:26 PM, Wolfgang Hoschek wrote:

> Here's a convenience add-on method to MemoryIndex. If it turns out  
> that this could be of wider use, it could be moved into the core  
> analysis package. For the moment the MemoryIndex might be a better  
> home. Opinions, anyone?
>
> Wolfgang.
>
>     /**
>      * Convenience method; Creates and returns a token stream that  
> generates a
>      * token for each keyword in the given collection, "as is",  
> without any
>      * transforming text analysis. The resulting token stream can  
> be fed into
>      * {@link #addField(String, TokenStream)}, perhaps wrapped into  
> another
>      * {@link org.apache.lucene.analysis.TokenFilter}, as desired.
>      *
>      * @param keywords
>      *            the keywords to generate tokens for
>      * @return the corresponding token stream
>      */
>     public TokenStream keywordTokenStream(final Collection keywords) {
>         if (keywords == null)
>             throw new IllegalArgumentException("keywords must not  
> be null");
>
>         return new TokenStream() {
>             Iterator iter = keywords.iterator();
>             int pos = 0;
>             int start = 0;
>             public Token next() {
>                 if (!iter.hasNext()) return null;
>
>                 Object obj = iter.next();
>                 if (obj == null)
>                     throw new IllegalArgumentException("keyword  
> must not be null");
>
>                 String term = obj.toString();
>                 Token token = new Token(term, start, start +  
> term.length());
>                 start += term.length() + 1; // separate words by 1  
> (blank) character
>                 pos++;
>                 return token;
>             }
>         };
>     }
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message