lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: index size growing while deleting
Date Fri, 06 Nov 2015 10:29:00 GMT
It's also important to IndexWriter.commit (as well as open new NRT
readers) periodically or after doing a large set of updates, as that
lets Lucene remove any old segments referenced by the prior commit
point.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Nov 6, 2015 at 2:59 AM, Rob Audenaerde <rob.audenaerde@gmail.com> wrote:
> Hi will, others
>
> Thanks for you reply,
>
> As far as I understand it, deleting a document is just setting the deleted
> bit, and when segments are merged, then the documents are removed. (not
> really sure what this means exactly; I guess the document gets removed from
> the store, the terms will no longer refer to that document. Not sure if
> terms get removed if no longer needed, etc). If there are resources to read
> to improve my understanding I havo not found them (yet), if you could point
> me to some that be great!
>
> I use the default IndexWriterConfig, which I see uses TieredMergePolicy. I
> never close my InderWriter; as I use NRT searching I just alwyas keep it
> open.
>
> My two guesses are that: a) old segments are not removed from disk or b)
> deletes are not cleaned up as well as I though they would be.
>
> I have made a testcase which indexes 5 million rows (five iterations, five
> indexing thread, indexing and deleting all such documents after each
> iterator with deleteByQuery), the rows randomly generated. I see the
> Taxonomy ever growing (which is logical, because facet-ordinals are never
> removed as far as I understand); the index grows, but also shrinks when
> deleting. So I cannot reproduce my problem easily :(
>
> I will start diving into the Lucene source code, but I was hoping I just
> did something wrong. .
>
> Any hints are appreciated!
>
> -Rob
>
>
> On Thu, Nov 5, 2015 at 2:52 PM, will <wmartinusa@gmail.com> wrote:
>
>> Hi Rob:
>>
>> Do you understand how deletes work and how an index is compacted?
>>
>> There's some configuration/runtime activities you don't mention.... And
>> you make testing process sound like a mirror of production? (Including
>> configuration?)
>>
>>
>> -will
>>
>>
>> On 11/5/15 7:33 AM, Rob Audenaerde wrote:
>>
>>> Hi all,
>>>
>>> I'm currently investigating an issue we have with our index. It keeps
>>> getting bigger, and I don't het why.
>>>
>>> Here is our use case:
>>>
>>> We index a database of about 4 million records; spread over a few hundred
>>> tables. The data consists of a mix of text, dates, numbers etc. We also
>>> add
>>> all these fields as facets.
>>> Each night we delete about 90% of the data, which in testing reduces the
>>> index size significantly.
>>> We store the data as StoredFields as well, to prevent having to access the
>>> database at all.
>>> We use FloatAssociatedFacet fields for the facets.
>>>
>>>
>>> In production however, it seems the index is only growing, up to 71 GB for
>>> these records for a month of running.
>>>
>>> It seems that lucene's index in just getting bigger there.
>>>
>>> We use lucene 5.3 on CentOS, java 8 64 bit.
>>>
>>> The taxonomy-index does not grow significantly.
>>>
>>> How should I go about checking what is wrong?
>>>
>>> Thanks!
>>>
>>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message