lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-6161) Applying deletes is sometimes dog slow
Date Wed, 21 Jan 2015 22:26:35 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-6161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286421#comment-14286421
] 

Robert Muir commented on LUCENE-6161:
-------------------------------------

Just a few minor thoughts:

Some of the iteration is more awkward now, it might be nice to open a followup to clean this
up.
delGen is awkward to see being held in PrefixCodedTerms, and we have an iterator api that
... is neither termsenum or iterable but another one instead.
I wonder if we could have the same logic, but using a more natural one. if it would just make
the code even more awkward, then screw it :)

We should fix the issue though for now I think.

> Applying deletes is sometimes dog slow
> --------------------------------------
>
>                 Key: LUCENE-6161
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6161
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, Trunk
>
>         Attachments: LUCENE-6161.patch, LUCENE-6161.patch, LUCENE-6161.patch, LUCENE-6161.patch,
LUCENE-6161.patch
>
>
> I hit this while testing various use cases for LUCENE-6119 (adding auto-throttle to ConcurrentMergeScheduler).
> When I tested "always call updateDocument" (each add buffers a delete term), with many
indexing threads, opening an NRT reader once per second (forcing all deleted terms to be applied),
I see that BufferedUpdatesStream.applyDeletes sometimes seems to take a loooong time, e.g.:
> {noformat}
> BD 0 [2015-01-04 09:31:12.597; Lucene Merge Thread #69]: applyDeletes took 339 msec for
10 segments, 117 deleted docs, 607333 visited terms
> BD 0 [2015-01-04 09:31:18.148; Thread-4]: applyDeletes took 5533 msec for 62 segments,
10989 deleted docs, 8517225 visited terms
> BD 0 [2015-01-04 09:31:21.463; Lucene Merge Thread #71]: applyDeletes took 1065 msec
for 10 segments, 470 deleted docs, 1825649 visited terms
> BD 0 [2015-01-04 09:31:26.301; Thread-5]: applyDeletes took 4835 msec for 61 segments,
14676 deleted docs, 9649860 visited terms
> BD 0 [2015-01-04 09:31:35.572; Thread-11]: applyDeletes took 6073 msec for 72 segments,
13835 deleted docs, 11865319 visited terms
> BD 0 [2015-01-04 09:31:37.604; Lucene Merge Thread #75]: applyDeletes took 251 msec for
10 segments, 58 deleted docs, 240721 visited terms
> BD 0 [2015-01-04 09:31:44.641; Thread-11]: applyDeletes took 5956 msec for 64 segments,
15109 deleted docs, 10599034 visited terms
> BD 0 [2015-01-04 09:31:47.814; Lucene Merge Thread #77]: applyDeletes took 396 msec for
10 segments, 137 deleted docs, 719914 visit
> {noformat}
> What this means is even though I want an NRT reader every second, often I don't get one
for up to ~7 or more seconds.
> This is on an SSD, machine has 48 GB RAM, heap size is only 2 GB.  12 indexing threads.
> As hideously complex as this code is, I think there are some inefficiencies, but fixing
them could be hard / make code even hairier ...
> Also, this code is mega-locked: holds IW's lock, holds BD's lock.  It blocks things like
merges kicking off or finishing...
> E.g., we pull the MergedIterator many times on the same set of sub-iterators.  Maybe
we can create the sorted terms up front and reuse that?
> Maybe we should go "term stride" (one term visits all N segments) not "segment stride"
(visit each segment, iterating all deleted terms for it).  Just iterating the terms to be
deleted takes a sizable part of the time, and we now do that once for every segment in the
index.
> Also, the "isUnique" bit in LUCENE-6005 should help here, since if we know the field
is unique, we can stop seekExact once we found a segment that has the deleted term, we can
maybe pass false for removeDuplicates to MergedIterator...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message