lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2680) Improve how IndexWriter flushes deletes against existing segments
Date Wed, 10 Nov 2010 17:25:16 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930658#action_12930658
] 

Jason Rutherglen commented on LUCENE-2680:
------------------------------------------

I think I've taken LUCENE-2680 as far as I can, though I'll
probably add some more assertions in there for good measure,
such as whether or not a delete has in fact been applied etc. It
seems to be working though again I should add more assertions to
that effect. I think there's a niggling sync issue in there as
TestThreadedOptimize only fails when I try to run it 100s of
times. I think the sync on DW is causing a wait notify to be
missed or skipped or something like that, as occasionally the
isOptimized call fails as well. This is likely related to the
appearance of deletes not being applied to segment(s) as
evidenced by the difference in the actual doc count and the
expected doc count.

Below is the most common assertion failure. Maybe I should
upload my patch that includes a method that iterates 200 times
on testThreadedOptimize?

{code}
[junit] ------------- ---------------- ---------------
    [junit] Testsuite: org.apache.lucene.index.TestThreadedOptimize
    [junit] Testcase: testThreadedOptimize(org.apache.lucene.index.TestThreadedOptimize):
FAILED
    [junit] expected:<248> but was:<266>
    [junit] junit.framework.AssertionFailedError: expected:<248> but was:<266>
    [junit] 	at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:878)
    [junit] 	at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:844)
    [junit] 	at org.apache.lucene.index.TestThreadedOptimize.runTest(TestThreadedOptimize.java:119)
    [junit] 	at org.apache.lucene.index.TestThreadedOptimize.testThreadedOptimize(TestThreadedOptimize.java:141)
    [junit] 
    [junit] 
    [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 1.748 sec
{code}

> Improve how IndexWriter flushes deletes against existing segments
> -----------------------------------------------------------------
>
>                 Key: LUCENE-2680
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2680
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch,
LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch
>
>
> IndexWriter buffers up all deletes (by Term and Query) and only
> applies them if 1) commit or NRT getReader() is called, or 2) a merge
> is about to kickoff.
> We do this because, for a large index, it's very costly to open a
> SegmentReader for every segment in the index.  So we defer as long as
> we can.  We do it just before merge so that the merge can eliminate
> the deleted docs.
> But, most merges are small, yet in a big index we apply deletes to all
> of the segments, which is really very wasteful.
> Instead, we should only apply the buffered deletes to the segments
> that are about to be merged, and keep the buffer around for the
> remaining segments.
> I think it's not so hard to do; we'd have to have generations of
> pending deletions, because the newly merged segment doesn't need the
> same buffered deletions applied again.  So every time a merge kicks
> off, we pinch off the current set of buffered deletions, open a new
> set (the next generation), and record which segment was created as of
> which generation.
> This should be a very sizable gain for large indices that mix
> deletes, though, less so in flex since opening the terms index is much
> faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message