lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: new Token API
Date Mon, 19 Nov 2007 15:04:07 GMT
On Nov 18, 2007 6:07 AM, Michael McCandless <lucene@mikemccandless.com> wrote:
> a quick test tokenizing all of Wikipedia w/
> SimpleAnalyzer showed 6-8% overall slowdown if I call token.clear() in
> ReadTokensTask.java.

We could slim down clear() a little by only resetting certain things...
startOffset and endOffset need to be set each time if anyone cares
about offsets, so they don't really need to be reset.  The only
tokenizer to use "type" sets it every time AFAIK, so would could argue
for skipping that as well.  Not sure if the small performance gain
would be worth it though.

 -Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message