lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2047) IndexWriter should immediately resolve deleted docs to docID in near-real-time mode
Date Tue, 17 Nov 2009 18:44:39 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779037#action_12779037
] 

Jason Rutherglen commented on LUCENE-2047:
------------------------------------------

{quote}resolving deletes syncs the entire IW + DW, so that
blocks indexing new docs, launching/committing merges, flushing,
etc... I just don't think NRT is really a driver for
this...{quote}

Right, I think it's a general improvement that's come to light
during NRT development and testing. I think NRT is great in this
regard because it stresses Lucene in a completely new way, which
improves it for the general batch use case (i.e. users can
simply start utilizing NRT features when they need to without
worry). 

{quote} 1: Analyzing hits an exception for a doc, it's doc id
has already been allocated so we mark it for deletion later (on
flush?) in BufferedDeletes. {quote}

So there's only one use case right now, which is only when
analyzing an individual doc fails. The update doc adds the term
to the BufferedDeletes for later application. Makes sense. I
think we can resolve the update doc term in the foreground.  I'm
wondering if we need a different doc id queue for these? I get
the hunch yes, because the other doc ids need to be applied even
on IO exception, whereas update doc id will not be applied?

{quote} 2: RAM Buffer writing hits an exception, we've had
updates which marked deletes in current segments, however they
haven't been applied yet because they're stored in
BufferedDeletes docids. They're applied on successful flush.
{quote}

In essence we need to implement number 2?

{quote}Analyzing or any other "non-aborting" exception,
right.{quote}

What is an example of a non-aborting exception?

{quote} use 1.5's ReentrantReadWriteLock {quote}

I'll incorporate RRWL into the follow on concurrent updating
patch. 

{quote} Whoa! Merely thinking about and discussing even how to
run proper tests for NRT, let alone the possible improvements to
Lucene on the table, is sucking up all my time {quote}

Yow, I didn't know. Thanks!

> IndexWriter should immediately resolve deleted docs to docID in near-real-time mode
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2047
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2047
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2047.patch, LUCENE-2047.patch
>
>
> Spinoff from LUCENE-1526.
> When deleteDocuments(Term) is called, we currently always buffer the
> Term and only later, when it's time to flush deletes, resolve to
> docIDs.  This is necessary because we don't in general hold
> SegmentReaders open.
> But, when IndexWriter is in NRT mode, we pool the readers, and so
> deleting in the foreground is possible.
> It's also beneficial, in that in can reduce the turnaround time when
> reopening a new NRT reader by taking this resolution off the reopen
> path.  And if multiple threads are used to do the deletion, then we
> gain concurrency, vs reopen which is not concurrent when flushing the
> deletes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message