lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Sturge <peter.stu...@gmail.com>
Subject commitReserveDuration, backups and saveCommitPoint
Date Thu, 02 Sep 2010 16:47:18 GMT
Hi,

This post is related to SOLR-1475 - 'Java-based remplication doesn't
properly reserve its commit point during backups', and index backups in
general.

In Solr 1.4 and 1.4.1, the SOLR-1475 patch is certainly there, but I don't
believe it truly addresses the problem.

Here's why:

When a 'backup' command is received by the RemplicationHandler, it creates a
SnapShooter instance and asynchronously does a full file snapshot of the
current commit point.
The current commit version to which this refers, however, is set to be
cleared on the next commit by the value of 'commitReserveDuration', which,
by default, is set to 10secs. (see cleanReserves() in
IndexDeletionPolicyWrapper.java).

If you perform a backup and no commits occur during this time, it's fine,
because clearReserves() is not called. If you do get a commit during the
backup process, and the backup takes longer than 10secs,
the whole snapshot operation fails (because delete() doesn't see the commit
point in savedCommits - see below).

The non-coding workaround to this is to explicitly set
'commitReserveDuration' in solrconfig.xml to a value that is higher than the
maximum time it takes to do a full backup. As this parameter looks to be
used by backup snapshots/postCommits only,
setting this to a high value should be ok (but I could be wrong about this -
anyone familiar with the SnapShooter/DeletionPolicy code know why this might
be bad?). I've tested it set to 02:00:00 (2hours) with no ill effects.

*Possible patch to SOLR-1475?*
Looking at the code in IndexDeletionPolicyWrapper.java, I believe the
problem can be found in saveCommitPoint(). The 'savedCommits' HashMap is
referenced and checked, but it's always empty as there is no
savedCommits.put().

It looks to be a one-line fix:

IndexDeletionPolicyWrapper.java:103:
  /** Permanently prevent this commit point from being deleted.
   * A counter is used to allow a commit point to be correctly saved and
released
   * multiple times. */
  public synchronized void saveCommitPoint(Long indexCommitVersion) {
    AtomicInteger reserveCount = savedCommits.get(indexCommitVersion);
    if (reserveCount == null) reserveCount = new AtomicInteger();
    reserveCount.incrementAndGet();
+   savedCommits.put(indexCommitVersion, reserveCount);
  }

If it's agreed by the experts this is a good fix, I guess it should go into
the SOLR-1475 issue etc., but I thought I'd run it past those more
knowledgable of this part of the code base before entering it into JIRA.
Any thoughts, comments are greatly appreciated.

Thanks,
Peter

Mime
View raw message