lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-2402) Add an explicit method to invoke IndexDeletionPolicy
Date Wed, 21 Apr 2010 06:08:49 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shai Erera updated LUCENE-2402:
-------------------------------

    Attachment: LUCENE-2402.patch

Patch changes deleteUnusedFiles to call IFD.checkpoint and also adds a testDeleteUnusedFiles2
to TestIndexWriter.

Currently, TestIndexWriterReader.testDuringAddIndexes fails, if deleteUnusedFiles is coded
like this:
{code}
public synchronized void deleteUnusedFiles() throws IOException {
  deleter.checkpoint(segmentInfos, true);
}
{code}

The failure happens in CommitPoint's ctor in the assert statement which verifies the SegmentInos
does not have external Directory. When I debug-traced the test, it passed and so I concluded
it's a concurrency issue (and indeed testDuringAddIndexes spawns several threads.

addIndexesNoOptimize does change SegmentInfos as it adds indexes, however at the end it 'fixes'
their Directory reference. I wondered how is regular commit() works when addIndexesNoOptimize
is called, but couldn't find any synchronization block where one blocks the other. Eventually,
I've changed deleteUnusedFiles to this:
{code}
public synchronized void deleteUnusedFiles() throws IOException {
  synchronized (commitLock) {
    deleter.checkpoint(rollbackSegmentInfos, true);
    // deleter.checkpoint((SegmentInfos) segmentInfos.clone(), true);
  }
}
{code}

I've tried to sync on commitLock (which seems good anyway), but the test kept failing. Even
cloning SI did not work because it might have changed just before the clone. Only when passing
rollbackSI to checkpoint does the test pass. But I'm not sure if that's the right solution,
as when I debug-traced it and put a break point just before the call to checkpoint, SI included
one segment w/ a different name than rollbackSI ...

BTW, the test fails on DirReader.doClose, where it checks if writer != null and then calls
deleteUnusedFiles. So I guess it's a NRT problem only.

In general, that that addIndexesNoOptimize messes w/ SI seems dangerous to me, because that's
undocumented and unprotected - e.g. if someone extends IW and adds some logic which requires
reading SI ... I'm not sure how to solve it, but that seems unrelated to that issue (probably
much more complicated to solve).

> Add an explicit method to invoke IndexDeletionPolicy
> ----------------------------------------------------
>
>                 Key: LUCENE-2402
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2402
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1
>
>         Attachments: LUCENE-2402.patch
>
>
> Today, if one uses an IDP which holds onto segments, such as SnapshotDeletionPolicy,
or any other IDP in the tests, those segments are left in the index even if the IDP no longer
references them, until IW.commit() is called (and actually does something). I'd like to add
a specific method to IW which will invoke the IDP's logic and get rid of the unused segments
w/o forcing the user to call IW.commit(). There are a couple of reasons for that:
> * Segments take up sometimes valuable HD space, and the application may wish to reclaim
that space immediately. In some scenarios, the index is updated once in several hours (or
even days), and waiting until then may not be acceptable.
> * I think it's a cleaner solution than waiting for the next commit() to happen. One can
still wait for it if one wants, but otherwise it will give you the ability to immediately
get rid of those segments.
> * TestSnapshotDeletionPolicy includes this code, which only strengthens (IMO) the need
for such method:
> {code}
> // Add one more document to force writer to commit a
> // final segment, so deletion policy has a chance to
> // delete again:
> Document doc = new Document();
> doc.add(new Field("content", "aaa", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
> writer.addDocument(doc);
> {code}
> If IW had an explicit method, that code would not need to exist there at all ...
> Here comes the fun part - naming the baby:
> * invokeDeletionPolicy -- describes exactly what is going to happen. However, if the
user did not set IDP at all (relying on default, which I think many do), users won't understand
what is it.
> * deleteUnusedSegments - more user-friendly, assuming users understand what 'segments'
are.
> BTW, IW already has deleteUnusedFiles() which only tries to delete unreferenced files
that failed to delete before (such as on Windows, due to e.g. open readers). Perhaps instead
of inventing a new name, we can change IW.deleteUnusedFiles to call IndexFileDeleter.checkpoint
(instead of deletePendingFiles) which deletes those files + calls IDP.onCommit().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message