lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: max docs, deleted docs optimization
Date Tue, 31 Oct 2017 15:00:48 GMT
1> 2 lakh at most. If the standard background merging is going on it
may be less than that.

2> Some, but whether you notice or not is an open question. In an
index with only 10 lakh docs, it's unlikely even having 50% deleted
documents is going to make much of a difference.

3> Yes, the deleted docs are in segment until it's merged away. Lucene
is very efficient (according to Mike McCandless) at skipping deleted
docs.

4> It rewrites all segments, purging deleted documents. However, it
has some pitfalls, see:
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/.
In general it's simply not recommended to optimize. There is a Solr
JIRA discussing this in detail, but I can't get to the site to link it
right now.

In general, as an index is updated segments are merged together and
during that process any deleted documents are purged.

Two resources:
https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

See the third animation TieredMergePolicy which is the default here:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Best,
Erick

On Tue, Oct 31, 2017 at 4:40 AM, kshitij tyagi
<kshitij.shopclues@gmail.com> wrote:
> Hi,
>
> I am using atomic update to update one of the fields, I want to know :
>
> 1. if total docs in core are 10 lakh and I partially update 2 lakhs docs
> then what will be the number of deleted docs?
>
> 2. Does higher number of deleted docs have affect on query time? means does
> query time increases if deleted docs are more
>
> 3. Are deleted docs present in segment? during query execution does deleted
> docs are traversed.
>
> 4. What doe optimized button on solr admin does exactly.
>
> Help is much appreciated.
>
> Regards,
> Kshitij

Mime
View raw message