lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <>
Subject [jira] Commented: (LUCENE-937) Make CachingTokenFilter faster
Date Fri, 22 Jun 2007 02:02:25 GMT


Mark Miller commented on LUCENE-937:

The 15,000 calls are each on a separate document. The documents are reletivley small...newspapers
articles from Reuters. Anything smaller would have to be very small.

I have again carefully tested LinkedList get VS LinkedList iterator and the performance is
identical as far as I can tell.

I'll do more work to prove my case when I get a free moment, but just to be clear:

I am using small documents (15,000 different varying sized docs), measuring the total time
and dividing by 15,000. The results show a 43% improvement using ArrayList(30). I will run
a test will even smaller docs when I get a chance. In my work with a new Span based Highlighter,
I need this speed or my implementation is slower than the old Highlighter. With this boost,
my Span based Highlighter is actually (very)slightly faster. If you decide to keep things
as they are I will have to roll an alternate CachingTokenFilter for my Highlighter (no problem
of course <g>).

Perhaps it is best to just leave things as they are and if you need more performance on docs
with more than a handful of tokens, make your own Caching Filter. If the common case is closer
to docs the size of newspaper articles or larger, a 43% gain is hard to ignore. 

I will get back about the speed when using very short documents.

>>Only the pointers to the objects are contiguous, right? 
One of these days I will actually make that transition from C++ to Java <g> I don't
know where the speed is coming from then...but its a heck of a difference.

- Mark

> Make CachingTokenFilter faster
> ------------------------------
>                 Key: LUCENE-937
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachingTokenFilter.patch
> The wrong data structure was used for the CachingTokenFilter. It should be an ArrayList
rather than a LinkedList. There is a noticeable difference in speed.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message