lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: forceMerge(1) grows index and does not shrink back
Date Mon, 19 Jan 2015 13:13:36 GMT
Hi,

> we use 4.8.1. We know that the javadoc advises against it. Like I wrote, the
> deletion of old documents (that appear during an update) would be done
> while closing the writer.

This is not true. The merge policy continuously merges segments that contain deletions. The
problem you might have is the following:
If you call forceMerge(1) for the first time, your index is reduced from a well distributed
multi-segment index to one single, large segment. If you then apply deletes, they are applied
against this large segment. Newly added documents are added to new segments. Those new segments
are small, so they are merged with preference. The deletions in the huge single segment are
very unlikely merged away, because Lucene only touches this segment as a large resort. So
the problem starts when you call forceMerge for the first time!

If you don’t call forceMerge and continuously index, you deletions will be removed quite
fast. This is especially true if the deletions are well-distributed over the whole index!
There are tons of instances with Elasticsearch and Lucene doing this all the time. They never
ever close their writer. Be sure to use TieredMergePolicy (the default), because this one
prefers segments that have many deletions. The old LogMergePolicy does not respect deletes,
but should no longer be used, unless you rely on a specific index order of your documents.

> Unfortunately we can't close the writer and we
> chose the force merge as alternative with less afford. Could
> forceMergeDeletes serve our purpose here?

It could, but has the same problem like above. The only difference to forceMerge is that it
only merges segments which have deletions.

> I will take a look into it with lsof, but I'm pretty sure, the files will be held by
> some javaprocess.
> 
> Jürgen.
> 
> Am 19.01.2015 um 13:36 schrieb Ian Lea:
> > Do you need to call forceMerge(1) at all?  The javadoc, certainly for
> > recent versions of lucene, advises against it.  What version of lucene
> > are you running?
> >
> > It might be helpful to run lsof against the index directory
> > before/during/after the merge to see what files are coming or going,
> > or if there are any marked as deleted but still present.  That would
> > imply that something, somewhere, was holding on to the files.
> >
> >
> > --
> > Ian.
> >
> >
> > On Fri, Jan 16, 2015 at 1:57 PM, Jürgen Albert
> > <j.albert@data-in-motion.biz> wrote:
> >> Hi,
> >>
> >> because we have constant updates on our index, we can't really close
> >> the index from time to time. Therefore we decided to trigger
> >> forceMerge  when the traffic is lowest, the clean up.
> >>
> >> On our development laptops (Windows and Linux) it works as expected,
> >> but on the real Servers we have some wired behaviour.
> >>
> >> Scenario:
> >>
> >> We create a fresh index and populate it. This results in an index
> >> with a size of 2 GB. If we rigger forceMerge(1) and a commit()
> >> afterwards for this index, the index grows over the next 10 minutes
> >> to 6 GB and does not shrink back. During the whole process no reader is
> opened on the index.
> >> If I try the same stunt with the same data on my Windows Laptop, it
> >> does nothing at all and finishes after a few ms.
> >>
> >> Any Ideas?
> >>
> >> Technical details:
> >> We use an MMapDirectory and the Server is a Debian7 Kernel 3.2 in a
> >> KVM. The file system is Ext4.
> >>
> >> Thx,
> >>
> >> Jürgen Albert.
> >>
> >> --
> >> Jürgen Albert
> >> Geschäftsführer
> >>
> >> Data In Motion UG (haftungsbeschränkt)
> >>
> >> Kahlaische Str. 4
> >> 07745 Jena
> >>
> >> Mobil:  0157-72521634
> >> E-Mail: j.albert@datainmotion.de
> >> Web: www.datainmotion.de
> >>
> >> XING:   https://www.xing.com/profile/Juergen_Albert5
> >>
> >> Rechtliches
> >>
> >> Jena HBR 507027
> >> USt-IdNr: DE274553639
> >> St.Nr.: 162/107/04586
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> 
> --
> Jürgen Albert
> Geschäftsführer
> 
> Data In Motion UG (haftungsbeschränkt)
> 
> Kahlaische Str. 4
> 07745 Jena
> 
> Mobil:  0157-72521634
> E-Mail: j.albert@datainmotion.de
> Web: www.datainmotion.de
> 
> XING:   https://www.xing.com/profile/Juergen_Albert5
> 
> Rechtliches
> 
> Jena HBR 507027
> USt-IdNr: DE274553639
> St.Nr.: 162/107/04586
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message