lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: indexing_slowdown_with_latest_lucene_udpate
Date Mon, 10 Aug 2009 15:06:41 GMT
btw my lucene 2.4 numbers for this corpus (running many times) average
around 41s versus 44s,
so its still a small hit even for reasonably large docs, using simple
analyzers with reuse and all that.

so reusableTokenStream takes care of a lot of it, but not all of it.
On Mon, Aug 10, 2009 at 10:48 AM, Mark Miller<markrmiller@gmail.com> wrote:
> Robert Muir wrote:
>>
>> This is real and not just for very short docs.
>
> Yes, you still pay the cost for longer docs, but it just becomes less
> important the longer the docs, as it plays a smaller role. Load a ton of one
> term docs, and it might be 50-60% slower - add a bunch of articles, and it
> might be closer to 20%-15% (I don't know the numbers, but the longer I made
> the docs, the less % slowdown, obviously). Still a good hit, but a short doc
> test magnafies the problem.
>
> It affects things no matter what, but when you don't do much tokenizing,
> normalizing, the cost of the reflection/tokenstream init dominates.
>
> - Mark
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>



-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message