lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <>
Subject [jira] Commented: (LUCENE-937) Make CachingTokenFilter faster
Date Fri, 22 Jun 2007 01:03:25 GMT


Mark Miller commented on LUCENE-937:

My tests early must have gotten out of whack. I was measuring a much bigger difference than
I see now.

As a result, I started from scratch, carefully creating and lableing a new Lucene core jar
for each case and averaging the performance over 15,000 calls creating and reading TokenStreams
off the Reuters data.

After very thorough testing (I was in quite a hurry this morning), I have come up with the

LinkedList() using get, LinkedList() using iterator, and ArrayList() are practically identical
in speed.

ArrayList(30) gave a 47% increase in speed. Above 30-60 gave no more returns.

This patch should not go through as is. What do you think given these results? I assumed that
an ArrayList would be faster as all of the data is guaranteed contiguous, but it surprised
me that the resizing was not enough to slow things down to LinkedList speed (unless you start
with too low an initial size -- default is 10).

- Mark

> Make CachingTokenFilter faster
> ------------------------------
>                 Key: LUCENE-937
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachingTokenFilter.patch
> The wrong data structure was used for the CachingTokenFilter. It should be an ArrayList
rather than a LinkedList. There is a noticeable difference in speed.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message