lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <>
Subject [jira] Commented: (LUCENE-937) Make CachingTokenFilter faster
Date Fri, 22 Jun 2007 01:32:26 GMT


Doron Cohen commented on LUCENE-937:

> Mark: I assumed that an AL would be faster as all of the data is guaranteed contiguous

Only the pointers to the objects are contiguous, right? The tokens themselves are, well, where
they are. But with LinkedList there are new objects created, containing the tokens and the
pointers to the other list members. So it may be safe to say that if you can estimate the
list size (avoiding array grow), AL is preferable if there's no add/remove not at the end.

> Michael: (~)  LL iterator comparable to AL

That's a good point. I had the impression that AL is always simpler than LL and unless removing
or adding not at the end, it is preferable. (that's why I excluded the NgramTokenFiltrers
that use LL.removeFirst()).  Now you're saying that with iteration (instead of direct access)
LinkedList is supposed to be faster - could be, since then there's no need to grow the array.
(however you have more "pointers"). 

With this reasoning - 
  - CompoundFileWriter - using iterator, no direct access.
  - MultipleTermPositions -  same.
  - DocumentWRiter - same.
So I am not so sure anymore about needing to change in these classes.


In summary since we can't assume estimating the size in advance, I think the best change would
be as Michael suggested to use Iterator in CachingTokenFilter. 

> Make CachingTokenFilter faster
> ------------------------------
>                 Key: LUCENE-937
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: CachingTokenFilter.patch
> The wrong data structure was used for the CachingTokenFilter. It should be an ArrayList
rather than a LinkedList. There is a noticeable difference in speed.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message