lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: new Token API
Date Tue, 20 Nov 2007 00:34:23 GMT
On Nov 19, 2007 7:02 PM, Doug Cutting <cutting@apache.org> wrote:
> Yonik Seeley wrote:
> > 1) If we are deprecating some methods like String termText(), how
> > about at the same time deprecating "String type"?  If we want
> > lightweight per-token metadata for communication between filters, an
> > int or a long used as a bitvector (32 or 64 independent boolean vars
> > per token) would be much more useful than a single String.
>
> There are tokenizers that use the type string, e.g., StandardFilter &
> similar things in Nutch.  How would you replace such uses?  Add a bit
> for each token type?  Is that really that much more useful?

It is, given that it enables a token to have more than one type at once.
The benefit is probably relatively minor (the number of people who
would use it), and I wouldn't have brought it up except that it could
piggy-back on the other recent changes to Token.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message