lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: indexing_slowdown_with_latest_lucene_udpate
Date Mon, 10 Aug 2009 15:06:41 GMT
btw my lucene 2.4 numbers for this corpus (running many times) average
around 41s versus 44s,
so its still a small hit even for reasonably large docs, using simple
analyzers with reuse and all that.

so reusableTokenStream takes care of a lot of it, but not all of it.
On Mon, Aug 10, 2009 at 10:48 AM, Mark Miller<> wrote:
> Robert Muir wrote:
>> This is real and not just for very short docs.
> Yes, you still pay the cost for longer docs, but it just becomes less
> important the longer the docs, as it plays a smaller role. Load a ton of one
> term docs, and it might be 50-60% slower - add a bunch of articles, and it
> might be closer to 20%-15% (I don't know the numbers, but the longer I made
> the docs, the less % slowdown, obviously). Still a good hit, but a short doc
> test magnafies the problem.
> It affects things no matter what, but when you don't do much tokenizing,
> normalizing, the cost of the reflection/tokenstream init dominates.
> - Mark
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Robert Muir

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message