lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: Index writing performance of 3.5
Date Thu, 09 Feb 2012 12:13:43 GMT
one major thing that changed from 3.0.3 to 3.5 is that we use
TieredMergePolicy by default. can you try to use the same merge policy
on both 3.0.3 and 3.5 and report back? ie LogByteSizeMergePolicy or
whatever you are using...

simon

On Thu, Feb 9, 2012 at 5:28 AM, Vitaly Funstein <vfunstein@gmail.com> wrote:
> Hello,
>
> I am currently evaluating Lucene 3.5.0 for upgrading from 3.0.3, and
> in the context of my usage, the most important parameter is index
> writing throughput. To that end, I have been running various tests,
> but seeing some contradictory results from different setups, which
> hopefully someone with a better knowledge of Lucene's internals could
> explain...
>
> First, let me describe my usage of Lucene, which is common across all
> of these cases.
>
> 1. Terms: non-analyzed strings or integral types, mostly. No free form
> text values on fields.
> 2. All indexed fields are stored.
> 3. Multiple threads per index writer, in the overall application
> currently capped at 4.
> 4. Document deletes are performed with each index update, using a
> simple string term to identify the document.
> 5. Default IndexWriter config settings are used, i.e. directory type,
> merge policy, RAM buffer size, etc.
> 6. Typical data size for an index is anywhere from a few hundred K
> docs up to a few hundred M.
> 7. Hardware config:
> - kernel 2.6.16-60 SMP (SuSE Enterprise Server 10)
> - 16x CPU
> - 16G RAM
> - ReiserFS partition for index data (more on this below)
>
> Here is where things diverge though. The first use case is a
> standalone performance test, which writes 1M documents containing 4
> fields (2 string, 2 numeric) to a single index using 10 worker
> threads. In this case, I do not see any writing performance
> degradation when going from 3.0.3 to 3.5.
>
> The second setup is a distributed multi-threaded client server
> application, where Lucene is used on the server to implement the
> search functionality. Clients have the ability to submit searchable
> data for indexing, as well as to run queries against the data. I
> realize this is a very generic description, and if needed could
> provide more specifics later. For now, let's say the second test runs
> on one such client, and submits 3 million records for the server to
> process (and also index via Lucene). Total time taken is then
> reported.
>
> But when running the test above, I can definitely observe a consistent
> increase in test times when the only thing changing is Lucene going
> from 3.0.3 to 3.5.0, on the order of 15-35%.
>
> How could I reconcile this discrepancy? My theory at this point is
> that the combination of the kernel above and ReiserFS (default FS for
> the distro) somehow making index writing in 3.5.0 slower, possibly due
> to the BKL issue, but only when used in a heavily multi-threaded
> system. Unfortunately, I currently have no ext3 partitions, or ability
> to upgrade the kernel on the system to prove or disprove this.
>
> Has anyone experienced issues like this in a similar setup, or maybe
> benchmarked Lucene across different file system types and release
> versions?
>
> Thanks,
> -V
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message