lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: forceMerge(1) grows index and does not shrink back
Date Tue, 20 Jan 2015 12:39:20 GMT
Unclosed readers can definitely cause problems with index size, by
preventing the deletion of merged-away segments.  lsof can be useful
for diagnosing that.

As to the rest, I for one have lost track of what problems you've got
with which of your indexes.  I suggest you remove the forceMerge call,
double check for unclosed readers or anything else hanging on to index
files, then post a new message if you've still got problems.


--
Ian.


On Mon, Jan 19, 2015 at 2:16 PM, Jürgen Albert
<j.albert@data-in-motion.biz> wrote:
> Hi,
>
> Am 19.01.2015 um 14:13 schrieb Uwe Schindler:
>>
>> Hi,
>>
>>> we use 4.8.1. We know that the javadoc advises against it. Like I wrote,
>>> the
>>> deletion of old documents (that appear during an update) would be done
>>> while closing the writer.
>>
>> This is not true. The merge policy continuously merges segments that
>> contain deletions. The problem you might have is the following:
>> If you call forceMerge(1) for the first time, your index is reduced from a
>> well distributed multi-segment index to one single, large segment. If you
>> then apply deletes, they are applied against this large segment. Newly added
>> documents are added to new segments. Those new segments are small, so they
>> are merged with preference. The deletions in the huge single segment are
>> very unlikely merged away, because Lucene only touches this segment as a
>> large resort. So the problem starts when you call forceMerge for the first
>> time!
>>
>> If you don’t call forceMerge and continuously index, you deletions will be
>> removed quite fast. This is especially true if the deletions are
>> well-distributed over the whole index! There are tons of instances with
>> Elasticsearch and Lucene doing this all the time. They never ever close
>> their writer. Be sure to use TieredMergePolicy (the default), because this
>> one prefers segments that have many deletions. The old LogMergePolicy does
>> not respect deletes, but should no longer be used, unless you rely on a
>> specific index order of your documents.
>
> We use the default, which is the TieredMergePolicy as far as I can see. If
> what you write is true, I wonder why our index started growing in the first
> place. We have 2 indices, where the bigger one receives an update on every
> document every couple of days and a smaller one where every document is
> updated randomly over a period of roughly 3 minutes. After a couple of days,
> the indices became 12 GB each (the bigger one started with 2 GB and the
> smaller one with a couple of Megabytes). This should not happen if the
> MergePolicy works as intended. Can unclosed readers cause such a problem. We
> use a SearchManager to avoid this, but there can always be the possibility.
>
> On the other hand we have the case I initially described. We have a fresh
> index, that we populate. No reader is opened and no additional updates have
> been made. Therefore I see no reason why forceMerge triples the size of the
> index at all.
>>>
>>> Unfortunately we can't close the writer and we
>>> chose the force merge as alternative with less afford. Could
>>> forceMergeDeletes serve our purpose here?
>>
>> It could, but has the same problem like above. The only difference to
>> forceMerge is that it only merges segments which have deletions.
>>
>>> I will take a look into it with lsof, but I'm pretty sure, the files will
>>> be held by
>>> some javaprocess.
>>>
>>> Jürgen.
>>>
>>> Am 19.01.2015 um 13:36 schrieb Ian Lea:
>>>>
>>>> Do you need to call forceMerge(1) at all?  The javadoc, certainly for
>>>> recent versions of lucene, advises against it.  What version of lucene
>>>> are you running?
>>>>
>>>> It might be helpful to run lsof against the index directory
>>>> before/during/after the merge to see what files are coming or going,
>>>> or if there are any marked as deleted but still present.  That would
>>>> imply that something, somewhere, was holding on to the files.
>>>>
>>>>
>>>> --
>>>> Ian.
>>>>
>>>>
>>>> On Fri, Jan 16, 2015 at 1:57 PM, Jürgen Albert
>>>> <j.albert@data-in-motion.biz> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> because we have constant updates on our index, we can't really close
>>>>> the index from time to time. Therefore we decided to trigger
>>>>> forceMerge  when the traffic is lowest, the clean up.
>>>>>
>>>>> On our development laptops (Windows and Linux) it works as expected,
>>>>> but on the real Servers we have some wired behaviour.
>>>>>
>>>>> Scenario:
>>>>>
>>>>> We create a fresh index and populate it. This results in an index
>>>>> with a size of 2 GB. If we rigger forceMerge(1) and a commit()
>>>>> afterwards for this index, the index grows over the next 10 minutes
>>>>> to 6 GB and does not shrink back. During the whole process no reader
is
>>>
>>> opened on the index.
>>>>>
>>>>> If I try the same stunt with the same data on my Windows Laptop, it
>>>>> does nothing at all and finishes after a few ms.
>>>>>
>>>>> Any Ideas?
>>>>>
>>>>> Technical details:
>>>>> We use an MMapDirectory and the Server is a Debian7 Kernel 3.2 in a
>>>>> KVM. The file system is Ext4.
>>>>>
>>>>> Thx,
>>>>>
>>>>> Jürgen Albert.
>>>>>
>>>>> --
>>>>> Jürgen Albert
>>>>> Geschäftsführer
>>>>>
>>>>> Data In Motion UG (haftungsbeschränkt)
>>>>>
>>>>> Kahlaische Str. 4
>>>>> 07745 Jena
>>>>>
>>>>> Mobil:  0157-72521634
>>>>> E-Mail: j.albert@datainmotion.de
>>>>> Web: www.datainmotion.de
>>>>>
>>>>> XING:   https://www.xing.com/profile/Juergen_Albert5
>>>>>
>>>>> Rechtliches
>>>>>
>>>>> Jena HBR 507027
>>>>> USt-IdNr: DE274553639
>>>>> St.Nr.: 162/107/04586
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>> --
>>> Jürgen Albert
>>> Geschäftsführer
>>>
>>> Data In Motion UG (haftungsbeschränkt)
>>>
>>> Kahlaische Str. 4
>>> 07745 Jena
>>>
>>> Mobil:  0157-72521634
>>> E-Mail: j.albert@datainmotion.de
>>> Web: www.datainmotion.de
>>>
>>> XING:   https://www.xing.com/profile/Juergen_Albert5
>>>
>>> Rechtliches
>>>
>>> Jena HBR 507027
>>> USt-IdNr: DE274553639
>>> St.Nr.: 162/107/04586
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> --
> Jürgen Albert
> Geschäftsführer
>
> Data In Motion UG (haftungsbeschränkt)
>
> Kahlaische Str. 4
> 07745 Jena
>
> Mobil:  0157-72521634
> E-Mail: j.albert@datainmotion.de
> Web: www.datainmotion.de
>
> XING:   https://www.xing.com/profile/Juergen_Albert5
>
> Rechtliches
>
> Jena HBR 507027
> USt-IdNr: DE274553639
> St.Nr.: 162/107/04586
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message