lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Issue with Solr TokenFilter and the new TokenStream API
Date Thu, 06 Aug 2009 15:21:48 GMT
uwe look at the patch i pasted in haste (i have a delivery guy here, sorry).

the filter had a bug all along (it was using termBuffer.length for
some length calculations).

On Thu, Aug 6, 2009 at 11:17 AM, Uwe Schindler<uwe@thetaphi.de> wrote:
> I looked into the code of this Filter. It is very simple and should work out
> of the box. There is no cloning done. When the indexer calls incrementToken,
> the delegation to next(Token) does not clone at all. It just uses the
> encapsulated Token instance (inside the AttributeImpl TokenWrapper) as
> reusableToken and calls next(reusable) and then replaces the encapsulated
> instance by the return value of next() -- so no cloning. As you do not
> change the token instance at all and return the reusable token it is all
> done on one Token/Attribute instance.
>
> In my opinion, this is the simpliest TokenFilter that could occur, it just
> changes the contents of the buffer. By the way, this one could be easily
> rewritten to use incrementToken() without cloning, just use
> termAtt.setTermBuffer() and so on.
>
> Where do you see a problem, does it simply not work or do you think there
> could be an issue?
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>> -----Original Message-----
>> From: Mark Miller [mailto:markrmiller@gmail.com]
>> Sent: Thursday, August 06, 2009 4:14 PM
>> To: java-dev@lucene.apache.org
>> Subject: Issue with Solr TokenFilter and the new TokenStream API
>>
>> I think there is an issue here, but I didn't follow the TokenStream
>> improvements very closely.
>>
>> In Solr, CapitalizationFilterFactory has a CharArray set that it loads
>> up with keep words - it then checks (with the old TokenStream API) each
>> token (char array) to see if it should keep it. I think because of the
>> cloning going on in next, this breaks and you can't match anything in
>> the keep set. Does that make sense?
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>



-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message