lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <>
Subject new Token API
Date Sun, 18 Nov 2007 02:21:19 GMT
Regarding the recent changes in Token (reusability and use char[]
instead of Token)

1) If we are deprecating some methods like String termText(), how
about at the same time deprecating "String type"?  If we want
lightweight per-token metadata for communication between filters, an
int or a long used as a bitvector (32 or 64 independent boolean vars
per token) would be much more useful than a single String.

2) I think we need to clarify who needs to "clean up" a token's state
when it's being reused (or if it needs to be cleaned up)... for
example, in the CharTokenizer, the token type, token payload, and
positionIncrement is not reset, so they will default to the last
token's value.... is this a) a bug  b) guaranteed behavior one can
depend on or c) undefined?  Since this includes positionIncrement, I'm
inclined to say that this is a bug.  There is a Token.clear()....
should it be used by either the caller or the Tokenizer?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message