lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Add Token.copyInto(Token) API
Date Wed, 12 Nov 2008 20:02:49 GMT
Are you aware of LUCENE-1422?  There is likely going to be a new way  
of dealing w/ TokenStreams all together, so you might want to have a  
look there before continuing.

On Nov 12, 2008, at 1:51 PM, Shai Erera wrote:

> Hi,
> I was thinking about adding a copyInto method to Token. The only way  
> to clone a token is by using its clone() or clone(char[], int, int,  
> int, int) methods. Both do the job, but allocate a Token instance.  
> While in 2.4 a Token constructor may actually get a char[] as input  
> (thus saving a char[] allocation), but it still allocates an instance.
> Even though the instance allocation is not that expensive, it does  
> allocate additional things, like String for the type, Payload and  
> String (for the text, even though that will be removed in 3.0).
> If an application wishes to keep one instance of Token around, and  
> copy into it other Tokens, it can call various methods to achieve  
> that, like setTermBuffer, setOffset etc. A copyInto is just a  
> convenient method for doing that.
> If you wonder about the use case, then here it is: I know that it's  
> advised to reuse the same Token instance in the TokenStream API  
> (basically make sure to call next(Token). But there might be  
> TokenFilters which will need to save a certain occurrance of a  
> token, do some processing and return it later. A good example is  
> StemmingFilter. One can think of such a filter to return the  
> original token in addition to the stemmed token (for examle, for the  
> word "tokens" in English, it will return "tokens" [original] and  
> "token" [stem]). In that case, the filter has to save the word  
> "tokens" so that it returns "tokens" first (or the stem, the order  
> does not matter) and next time its next(Token) is called, it should  
> return the stem (or original), before comsuming the next token from  
> the TokenStream.
> Anyway, I hope it's clear enough, but if not I can elaborate.
> If you think a copyInto() is worth the effort, I can quickly create  
> a patch for it).
> Shai

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message