lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Brusic <i...@brusic.com>
Subject Re: Slow merging after upgrading to 3.5
Date Wed, 18 Apr 2012 21:33:40 GMT
Just wanted to circle back and report on our progress.

We finally applied the settings to our production environment and the
improvements have been dramatic. Our indexing time has returned to 2.3
levels.

Thanks again,

Ivan

On Fri, Apr 6, 2012 at 11:36 AM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> On Thu, Apr 5, 2012 at 3:31 PM, Ivan Brusic <ivan@brusic.com> wrote:
>
>> On Thu, Apr 5, 2012 at 11:36 AM, Michael McCandless
>> <lucene@mikemccandless.com> wrote:
>>> I'm assuming this is a "build once and never change" index...?  Else,
>>> it sounds like you should never run forceMerge...
>>
>> Correct. The forceMerge was merely to preserve the previous 2.3
>> behavior of using optimize.
>
> OK.  Avoid it, unless you can't...
>
>>> To preserve insertion order you just need to use one of the
>>> Log*MergePolicy (which you are already doing).  Merge factor doesn't
>>> affect this...
>>
>> I was never sure why the merge factor was set to 2. My experiences in
>> the past was to set a high merge factor when doing a batch index.
>
> Well, it's not entirely clear... you'd have to test in your env to be sure.
>
> My instinct is to use a large (maybe infinite) MF while indexing, and
> then big MF while forceMerge'ing.
>
>>> For the fastest way to get to a single-segment index.... use
>>> NoMergePolicy while indexing the documents, and set the largest RAM
>>> buffer you can afford.  This will create tons of segments in the index
>>> dir, which is fine as long as you will not open a reader on it...
>>> then:
>>>
>>> Open a new IW, with Log*MergePolicy, set a highish (maybe 30)
>>> mergeFactor, and call forceMerge(1).  You may need to cutover to
>>> SerialMergeScheduler...
>>
>> NoMergePolicy? Never seen that class used before.
>
> It's like Log*MP with infinite mergeFactor...
>
>> RAM buffer size is
>> not an issue. Is the limitation still 2048MB?
>
> Yes.
>
>> Is the fastest way also the best way? :) There will never be a read
>> open on the index. Your second solution is similar to the existing
>> code with the exception of the mergeFactor. Will setting the merge
>> factor to a more reasonable number help with the merge speed?
>
> I think you'd have to test in your env.
>
> A non-infinite MF is good in that it gets some merges out of the way
> before the end, ie, you can soak up some otherwise unused IO
> resources/concurrency while you are indexing... making it less
> work/time to forceMerge in the end.
>
>> What enforces the preservation of the insertion order? The
>> MergePolicy?
>
> MergePolicy does.
>
> Though, in 4.0, it's also important you use only 1 thread for
> indexing.   Prior to 4.0, docIDs were assigned in arrival order,
> across threads, but with 4.0, each thread gets a private segment, so
> the docIDs are jumbled.
>
>> How does the MergeScheduler affect things?
>
> It shouldn't affect docID order.
>
>> Used Lucene
>> on a few projects over the years and I never had to tweak the index
>> creation.
>
> The defaults normally work well... but docID assignment is an impl
> detail and is free to change across releases...
>
>> I guess I need to reread the tuning chapter in LIA, it's
>> been a few years.
>
> ;)
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message