lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Audenaerde <rob.audenae...@gmail.com>
Subject Re: index size growing while deleting
Date Fri, 06 Nov 2015 07:59:22 GMT
Hi will, others

Thanks for you reply,

As far as I understand it, deleting a document is just setting the deleted
bit, and when segments are merged, then the documents are removed. (not
really sure what this means exactly; I guess the document gets removed from
the store, the terms will no longer refer to that document. Not sure if
terms get removed if no longer needed, etc). If there are resources to read
to improve my understanding I havo not found them (yet), if you could point
me to some that be great!

I use the default IndexWriterConfig, which I see uses TieredMergePolicy. I
never close my InderWriter; as I use NRT searching I just alwyas keep it
open.

My two guesses are that: a) old segments are not removed from disk or b)
deletes are not cleaned up as well as I though they would be.

I have made a testcase which indexes 5 million rows (five iterations, five
indexing thread, indexing and deleting all such documents after each
iterator with deleteByQuery), the rows randomly generated. I see the
Taxonomy ever growing (which is logical, because facet-ordinals are never
removed as far as I understand); the index grows, but also shrinks when
deleting. So I cannot reproduce my problem easily :(

I will start diving into the Lucene source code, but I was hoping I just
did something wrong. .

Any hints are appreciated!

-Rob


On Thu, Nov 5, 2015 at 2:52 PM, will <wmartinusa@gmail.com> wrote:

> Hi Rob:
>
> Do you understand how deletes work and how an index is compacted?
>
> There's some configuration/runtime activities you don't mention.... And
> you make testing process sound like a mirror of production? (Including
> configuration?)
>
>
> -will
>
>
> On 11/5/15 7:33 AM, Rob Audenaerde wrote:
>
>> Hi all,
>>
>> I'm currently investigating an issue we have with our index. It keeps
>> getting bigger, and I don't het why.
>>
>> Here is our use case:
>>
>> We index a database of about 4 million records; spread over a few hundred
>> tables. The data consists of a mix of text, dates, numbers etc. We also
>> add
>> all these fields as facets.
>> Each night we delete about 90% of the data, which in testing reduces the
>> index size significantly.
>> We store the data as StoredFields as well, to prevent having to access the
>> database at all.
>> We use FloatAssociatedFacet fields for the facets.
>>
>>
>> In production however, it seems the index is only growing, up to 71 GB for
>> these records for a month of running.
>>
>> It seems that lucene's index in just getting bigger there.
>>
>> We use lucene 5.3 on CentOS, java 8 64 bit.
>>
>> The taxonomy-index does not grow significantly.
>>
>> How should I go about checking what is wrong?
>>
>> Thanks!
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message