lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter K <>
Subject Re: Comparing Indexing Speed of Lucene 3.5 and 4.0
Date Thu, 05 Jan 2012 12:25:04 GMT
Hi Simon,

answers below.

>> It does not seem to be an 'IO related issue' because using RAMDirectory
>> results in the same times.
>> And indexing via Luc4 with only one thread shouldn't be slower than 3.5 (?)
> it could be since we use a different term dictionary impl which is
> more expensive in building than the previous versions; thats just a
> guess.
> What I am really wondering is why you are using the NRT manager and
> reopen during indexing - are you measuring the NRT reopen times too?

My project requires reopening as it will then clear some caches.

Reopening isn't that frequent (every 5 seconds). When disabling it the
difference even increases slightly, but the big variation for luc4 goes

> What merge policies are you using for 3x and 4x?

The default ones. I'm now using LogByteSizeMergePolicy for both but it
is nearly the same difference.

>>> You should add some more randomness or reality to your test.
>> Hmmh, ok. The uid and type is the reality in my other (experimental)
>> project as it uses a generated and incremented id from AtomicLong and
>> two types.
>> Or do you have an explanation why luc4 can be slower on such 'simple'
>> fields?
> you reported that indexing only the ID is faster in 4.x but the other
> fields AFAIK are likely always the same for all docs, no?

no, the _uid field is different: it's the id field converted to string.

> you are indexing with one thread right?


>  I mean my benchmarks show up
> to 300% improvement with 4.x versus older versions so something is
> weird ie. non-realistic here or there is a bug so lets figure this
> out. Can you profile you app and see if you find something suspicious?

I'll try now and report back.

> I'd also try to index way more documents to make your benchmarks run
> little longer just to be sure.

For ~5 times more docs (5 mio) it is nearly the same difference.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message