lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitaly Funstein <vfunst...@gmail.com>
Subject Index writing performance of 3.5
Date Thu, 09 Feb 2012 04:28:08 GMT
Hello,

I am currently evaluating Lucene 3.5.0 for upgrading from 3.0.3, and
in the context of my usage, the most important parameter is index
writing throughput. To that end, I have been running various tests,
but seeing some contradictory results from different setups, which
hopefully someone with a better knowledge of Lucene's internals could
explain...

First, let me describe my usage of Lucene, which is common across all
of these cases.

1. Terms: non-analyzed strings or integral types, mostly. No free form
text values on fields.
2. All indexed fields are stored.
3. Multiple threads per index writer, in the overall application
currently capped at 4.
4. Document deletes are performed with each index update, using a
simple string term to identify the document.
5. Default IndexWriter config settings are used, i.e. directory type,
merge policy, RAM buffer size, etc.
6. Typical data size for an index is anywhere from a few hundred K
docs up to a few hundred M.
7. Hardware config:
- kernel 2.6.16-60 SMP (SuSE Enterprise Server 10)
- 16x CPU
- 16G RAM
- ReiserFS partition for index data (more on this below)

Here is where things diverge though. The first use case is a
standalone performance test, which writes 1M documents containing 4
fields (2 string, 2 numeric) to a single index using 10 worker
threads. In this case, I do not see any writing performance
degradation when going from 3.0.3 to 3.5.

The second setup is a distributed multi-threaded client server
application, where Lucene is used on the server to implement the
search functionality. Clients have the ability to submit searchable
data for indexing, as well as to run queries against the data. I
realize this is a very generic description, and if needed could
provide more specifics later. For now, let's say the second test runs
on one such client, and submits 3 million records for the server to
process (and also index via Lucene). Total time taken is then
reported.

But when running the test above, I can definitely observe a consistent
increase in test times when the only thing changing is Lucene going
from 3.0.3 to 3.5.0, on the order of 15-35%.

How could I reconcile this discrepancy? My theory at this point is
that the combination of the kernel above and ReiserFS (default FS for
the distro) somehow making index writing in 3.5.0 slower, possibly due
to the BKL issue, but only when used in a heavily multi-threaded
system. Unfortunately, I currently have no ext3 partitions, or ability
to upgrade the kernel on the system to prove or disprove this.

Has anyone experienced issues like this in a similar setup, or maybe
benchmarked Lucene across different file system types and release
versions?

Thanks,
-V

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message