lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jürgen Albert <j.alb...@data-in-motion.biz>
Subject Re: index size growing while deleting
Date Tue, 10 Nov 2015 12:32:48 GMT
Hi Rob,

we use a SearcherManager to obtain a fresh Searcher for every Query. 
 From the Searcher we get the Reader. After the query you call 
searcherManager.release(searcher). The SearcherManager takes care of the 
rest.

Regards,

Jürgen.

Am 10.11.2015 um 13:27 schrieb Rob Audenaerde:
> Hi Jürgen, Michael
>
> Thanks! I seem to be able to reduce the index size by closing and
> restarting our application. This reduces the index size from 22G tot 4G,
> with is somewhat the expected size. The infoStream also gives me the
> 'removed unreferenced file (IFD 0 [2015-11-10T12:21:49.293Z; main]: init:
> removing unreferenced file '...)
>
> Now I just need to figure out how to close the IndexReader while keeping
> the application running..  I guess I should/could do something with the
> openIfChanged. Will look further.
>
> -Rob
>
>
>
> On Tue, Nov 10, 2015 at 12:19 PM, Jürgen Albert <j.albert@data-in-motion.biz
>> wrote:
>> Hi Rob,
>>
>> we had a similar problem. In our case we had open index readers, that
>> blocked the index from merging its segments and thus deleting the marked
>> segments.
>>
>> Regards,
>>
>> Jürgen.
>>
>>
>> Am 06.11.2015 um 08:59 schrieb Rob Audenaerde:
>>
>>> Hi will, others
>>>
>>> Thanks for you reply,
>>>
>>> As far as I understand it, deleting a document is just setting the deleted
>>> bit, and when segments are merged, then the documents are removed. (not
>>> really sure what this means exactly; I guess the document gets removed
>>> from
>>> the store, the terms will no longer refer to that document. Not sure if
>>> terms get removed if no longer needed, etc). If there are resources to
>>> read
>>> to improve my understanding I havo not found them (yet), if you could
>>> point
>>> me to some that be great!
>>>
>>> I use the default IndexWriterConfig, which I see uses TieredMergePolicy. I
>>> never close my InderWriter; as I use NRT searching I just alwyas keep it
>>> open.
>>>
>>> My two guesses are that: a) old segments are not removed from disk or b)
>>> deletes are not cleaned up as well as I though they would be.
>>>
>>> I have made a testcase which indexes 5 million rows (five iterations, five
>>> indexing thread, indexing and deleting all such documents after each
>>> iterator with deleteByQuery), the rows randomly generated. I see the
>>> Taxonomy ever growing (which is logical, because facet-ordinals are never
>>> removed as far as I understand); the index grows, but also shrinks when
>>> deleting. So I cannot reproduce my problem easily :(
>>>
>>> I will start diving into the Lucene source code, but I was hoping I just
>>> did something wrong. .
>>>
>>> Any hints are appreciated!
>>>
>>> -Rob
>>>
>>>
>>> On Thu, Nov 5, 2015 at 2:52 PM, will <wmartinusa@gmail.com> wrote:
>>>
>>> Hi Rob:
>>>> Do you understand how deletes work and how an index is compacted?
>>>>
>>>> There's some configuration/runtime activities you don't mention.... And
>>>> you make testing process sound like a mirror of production? (Including
>>>> configuration?)
>>>>
>>>>
>>>> -will
>>>>
>>>>
>>>> On 11/5/15 7:33 AM, Rob Audenaerde wrote:
>>>>
>>>> Hi all,
>>>>> I'm currently investigating an issue we have with our index. It keeps
>>>>> getting bigger, and I don't het why.
>>>>>
>>>>> Here is our use case:
>>>>>
>>>>> We index a database of about 4 million records; spread over a few
>>>>> hundred
>>>>> tables. The data consists of a mix of text, dates, numbers etc. We also
>>>>> add
>>>>> all these fields as facets.
>>>>> Each night we delete about 90% of the data, which in testing reduces
the
>>>>> index size significantly.
>>>>> We store the data as StoredFields as well, to prevent having to access
>>>>> the
>>>>> database at all.
>>>>> We use FloatAssociatedFacet fields for the facets.
>>>>>
>>>>>
>>>>> In production however, it seems the index is only growing, up to 71 GB
>>>>> for
>>>>> these records for a month of running.
>>>>>
>>>>> It seems that lucene's index in just getting bigger there.
>>>>>
>>>>> We use lucene 5.3 on CentOS, java 8 64 bit.
>>>>>
>>>>> The taxonomy-index does not grow significantly.
>>>>>
>>>>> How should I go about checking what is wrong?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>
>> --
>> Jürgen Albert
>> Geschäftsführer
>>
>> Data In Motion UG (haftungsbeschränkt)
>>
>> Kahlaische Str. 4
>> 07745 Jena
>>
>> Mobil:  0157-72521634
>> E-Mail: j.albert@datainmotion.de
>> Web: www.datainmotion.de
>>
>> XING:   https://www.xing.com/profile/Juergen_Albert5
>>
>> Rechtliches
>>
>> Jena HBR 507027
>> USt-IdNr: DE274553639
>> St.Nr.: 162/107/04586
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


-- 
Jürgen Albert
Geschäftsführer

Data In Motion UG (haftungsbeschränkt)

Kahlaische Str. 4
07745 Jena

Mobil:  0157-72521634
E-Mail: j.albert@datainmotion.de
Web: www.datainmotion.de

XING:   https://www.xing.com/profile/Juergen_Albert5

Rechtliches

Jena HBR 507027
USt-IdNr: DE274553639
St.Nr.: 162/107/04586


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message