lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-6161) Applying deletes is sometimes dog slow
Date Wed, 21 Jan 2015 22:26:35 GMT


Robert Muir commented on LUCENE-6161:

Just a few minor thoughts:

Some of the iteration is more awkward now, it might be nice to open a followup to clean this
delGen is awkward to see being held in PrefixCodedTerms, and we have an iterator api that
... is neither termsenum or iterable but another one instead.
I wonder if we could have the same logic, but using a more natural one. if it would just make
the code even more awkward, then screw it :)

We should fix the issue though for now I think.

> Applying deletes is sometimes dog slow
> --------------------------------------
>                 Key: LUCENE-6161
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, Trunk
>         Attachments: LUCENE-6161.patch, LUCENE-6161.patch, LUCENE-6161.patch, LUCENE-6161.patch,
> I hit this while testing various use cases for LUCENE-6119 (adding auto-throttle to ConcurrentMergeScheduler).
> When I tested "always call updateDocument" (each add buffers a delete term), with many
indexing threads, opening an NRT reader once per second (forcing all deleted terms to be applied),
I see that BufferedUpdatesStream.applyDeletes sometimes seems to take a loooong time, e.g.:
> {noformat}
> BD 0 [2015-01-04 09:31:12.597; Lucene Merge Thread #69]: applyDeletes took 339 msec for
10 segments, 117 deleted docs, 607333 visited terms
> BD 0 [2015-01-04 09:31:18.148; Thread-4]: applyDeletes took 5533 msec for 62 segments,
10989 deleted docs, 8517225 visited terms
> BD 0 [2015-01-04 09:31:21.463; Lucene Merge Thread #71]: applyDeletes took 1065 msec
for 10 segments, 470 deleted docs, 1825649 visited terms
> BD 0 [2015-01-04 09:31:26.301; Thread-5]: applyDeletes took 4835 msec for 61 segments,
14676 deleted docs, 9649860 visited terms
> BD 0 [2015-01-04 09:31:35.572; Thread-11]: applyDeletes took 6073 msec for 72 segments,
13835 deleted docs, 11865319 visited terms
> BD 0 [2015-01-04 09:31:37.604; Lucene Merge Thread #75]: applyDeletes took 251 msec for
10 segments, 58 deleted docs, 240721 visited terms
> BD 0 [2015-01-04 09:31:44.641; Thread-11]: applyDeletes took 5956 msec for 64 segments,
15109 deleted docs, 10599034 visited terms
> BD 0 [2015-01-04 09:31:47.814; Lucene Merge Thread #77]: applyDeletes took 396 msec for
10 segments, 137 deleted docs, 719914 visit
> {noformat}
> What this means is even though I want an NRT reader every second, often I don't get one
for up to ~7 or more seconds.
> This is on an SSD, machine has 48 GB RAM, heap size is only 2 GB.  12 indexing threads.
> As hideously complex as this code is, I think there are some inefficiencies, but fixing
them could be hard / make code even hairier ...
> Also, this code is mega-locked: holds IW's lock, holds BD's lock.  It blocks things like
merges kicking off or finishing...
> E.g., we pull the MergedIterator many times on the same set of sub-iterators.  Maybe
we can create the sorted terms up front and reuse that?
> Maybe we should go "term stride" (one term visits all N segments) not "segment stride"
(visit each segment, iterating all deleted terms for it).  Just iterating the terms to be
deleted takes a sizable part of the time, and we now do that once for every segment in the
> Also, the "isUnique" bit in LUCENE-6005 should help here, since if we know the field
is unique, we can stop seekExact once we found a segment that has the deleted term, we can
maybe pass false for removeDuplicates to MergedIterator...

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message