lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Slow merging after upgrading to 3.5
Date Thu, 19 Apr 2012 03:26:28 GMT
Super, thanks for bringing closure!

Mike McCandless

http://blog.mikemccandless.com

On Wed, Apr 18, 2012 at 5:33 PM, Ivan Brusic <ivan@brusic.com> wrote:
> Just wanted to circle back and report on our progress.
>
> We finally applied the settings to our production environment and the
> improvements have been dramatic. Our indexing time has returned to 2.3
> levels.
>
> Thanks again,
>
> Ivan
>
> On Fri, Apr 6, 2012 at 11:36 AM, Michael McCandless
> <lucene@mikemccandless.com> wrote:
>> On Thu, Apr 5, 2012 at 3:31 PM, Ivan Brusic <ivan@brusic.com> wrote:
>>
>>> On Thu, Apr 5, 2012 at 11:36 AM, Michael McCandless
>>> <lucene@mikemccandless.com> wrote:
>>>> I'm assuming this is a "build once and never change" index...?  Else,
>>>> it sounds like you should never run forceMerge...
>>>
>>> Correct. The forceMerge was merely to preserve the previous 2.3
>>> behavior of using optimize.
>>
>> OK.  Avoid it, unless you can't...
>>
>>>> To preserve insertion order you just need to use one of the
>>>> Log*MergePolicy (which you are already doing).  Merge factor doesn't
>>>> affect this...
>>>
>>> I was never sure why the merge factor was set to 2. My experiences in
>>> the past was to set a high merge factor when doing a batch index.
>>
>> Well, it's not entirely clear... you'd have to test in your env to be sure.
>>
>> My instinct is to use a large (maybe infinite) MF while indexing, and
>> then big MF while forceMerge'ing.
>>
>>>> For the fastest way to get to a single-segment index.... use
>>>> NoMergePolicy while indexing the documents, and set the largest RAM
>>>> buffer you can afford.  This will create tons of segments in the index
>>>> dir, which is fine as long as you will not open a reader on it...
>>>> then:
>>>>
>>>> Open a new IW, with Log*MergePolicy, set a highish (maybe 30)
>>>> mergeFactor, and call forceMerge(1).  You may need to cutover to
>>>> SerialMergeScheduler...
>>>
>>> NoMergePolicy? Never seen that class used before.
>>
>> It's like Log*MP with infinite mergeFactor...
>>
>>> RAM buffer size is
>>> not an issue. Is the limitation still 2048MB?
>>
>> Yes.
>>
>>> Is the fastest way also the best way? :) There will never be a read
>>> open on the index. Your second solution is similar to the existing
>>> code with the exception of the mergeFactor. Will setting the merge
>>> factor to a more reasonable number help with the merge speed?
>>
>> I think you'd have to test in your env.
>>
>> A non-infinite MF is good in that it gets some merges out of the way
>> before the end, ie, you can soak up some otherwise unused IO
>> resources/concurrency while you are indexing... making it less
>> work/time to forceMerge in the end.
>>
>>> What enforces the preservation of the insertion order? The
>>> MergePolicy?
>>
>> MergePolicy does.
>>
>> Though, in 4.0, it's also important you use only 1 thread for
>> indexing.   Prior to 4.0, docIDs were assigned in arrival order,
>> across threads, but with 4.0, each thread gets a private segment, so
>> the docIDs are jumbled.
>>
>>> How does the MergeScheduler affect things?
>>
>> It shouldn't affect docID order.
>>
>>> Used Lucene
>>> on a few projects over the years and I never had to tweak the index
>>> creation.
>>
>> The defaults normally work well... but docID assignment is an impl
>> detail and is free to change across releases...
>>
>>> I guess I need to reread the tuning chapter in LIA, it's
>>> been a few years.
>>
>> ;)
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message