lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitaly Funstein <vfunst...@gmail.com>
Subject Re: Index writing performance of 3.5
Date Sat, 11 Feb 2012 05:10:26 GMT
Tried changing the merge policy but it had no effect on the test
times. But I can rule out ReiserFS as the culprit now too, since I was
able to run with indexes stored on an ext3 partition, and observed a
similar slowdown.

So there's something else going on here with this particular test
setup, but I can't really distill a simpler use case, so for now I'll
leave it at that. Will post to this thread again if I find something
promising...

On Thu, Feb 9, 2012 at 4:13 AM, Simon Willnauer
<simon.willnauer@googlemail.com> wrote:
> one major thing that changed from 3.0.3 to 3.5 is that we use
> TieredMergePolicy by default. can you try to use the same merge policy
> on both 3.0.3 and 3.5 and report back? ie LogByteSizeMergePolicy or
> whatever you are using...
>
> simon
>
> On Thu, Feb 9, 2012 at 5:28 AM, Vitaly Funstein <vfunstein@gmail.com> wrote:
>> Hello,
>>
>> I am currently evaluating Lucene 3.5.0 for upgrading from 3.0.3, and
>> in the context of my usage, the most important parameter is index
>> writing throughput. To that end, I have been running various tests,
>> but seeing some contradictory results from different setups, which
>> hopefully someone with a better knowledge of Lucene's internals could
>> explain...
>>
>> First, let me describe my usage of Lucene, which is common across all
>> of these cases.
>>
>> 1. Terms: non-analyzed strings or integral types, mostly. No free form
>> text values on fields.
>> 2. All indexed fields are stored.
>> 3. Multiple threads per index writer, in the overall application
>> currently capped at 4.
>> 4. Document deletes are performed with each index update, using a
>> simple string term to identify the document.
>> 5. Default IndexWriter config settings are used, i.e. directory type,
>> merge policy, RAM buffer size, etc.
>> 6. Typical data size for an index is anywhere from a few hundred K
>> docs up to a few hundred M.
>> 7. Hardware config:
>> - kernel 2.6.16-60 SMP (SuSE Enterprise Server 10)
>> - 16x CPU
>> - 16G RAM
>> - ReiserFS partition for index data (more on this below)
>>
>> Here is where things diverge though. The first use case is a
>> standalone performance test, which writes 1M documents containing 4
>> fields (2 string, 2 numeric) to a single index using 10 worker
>> threads. In this case, I do not see any writing performance
>> degradation when going from 3.0.3 to 3.5.
>>
>> The second setup is a distributed multi-threaded client server
>> application, where Lucene is used on the server to implement the
>> search functionality. Clients have the ability to submit searchable
>> data for indexing, as well as to run queries against the data. I
>> realize this is a very generic description, and if needed could
>> provide more specifics later. For now, let's say the second test runs
>> on one such client, and submits 3 million records for the server to
>> process (and also index via Lucene). Total time taken is then
>> reported.
>>
>> But when running the test above, I can definitely observe a consistent
>> increase in test times when the only thing changing is Lucene going
>> from 3.0.3 to 3.5.0, on the order of 15-35%.
>>
>> How could I reconcile this discrepancy? My theory at this point is
>> that the combination of the kernel above and ReiserFS (default FS for
>> the distro) somehow making index writing in 3.5.0 slower, possibly due
>> to the BKL issue, but only when used in a heavily multi-threaded
>> system. Unfortunately, I currently have no ext3 partitions, or ability
>> to upgrade the kernel on the system to prove or disprove this.
>>
>> Has anyone experienced issues like this in a similar setup, or maybe
>> benchmarked Lucene across different file system types and release
>> versions?
>>
>> Thanks,
>> -V
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message