subversion-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From julianf...@apache.org
Subject svn commit: r1092641 - in /subversion/trunk/contrib/server-side/fsfsfixer: ./ README fix-repo fixer/ fixer/__init__.py fixer/find_good_id.py fixer/fix-rev.py
Date Fri, 15 Apr 2011 09:37:54 GMT
Author: julianfoad
Date: Fri Apr 15 09:37:54 2011
New Revision: 1092641

URL: http://svn.apache.org/viewvc?rev=1092641&view=rev
Log:
Add my set of scripts for fixing certain kinds of FSFS corruption.

For more details, see the email from Julian Foad on 2010-10-06, subject
"Fixing FSFS 'Corrupt node-revision' and 'Corrupt representation' errors",
<http://svn.haxx.se/dev/archive-2010-10/0095.shtml>.

* contrib/server-side/fsfsfixer,
  contrib/server-side/fsfsfixer/README,
  contrib/server-side/fsfsfixer/fix-repo,
  contrib/server-side/fsfsfixer/fixer,
  contrib/server-side/fsfsfixer/fixer/__init__.py,
  contrib/server-side/fsfsfixer/fixer/fix-rev.py,
  contrib/server-side/fsfsfixer/fixer/find_good_id.py
    New.

Added:
    subversion/trunk/contrib/server-side/fsfsfixer/
    subversion/trunk/contrib/server-side/fsfsfixer/README   (with props)
    subversion/trunk/contrib/server-side/fsfsfixer/fix-repo   (with props)
    subversion/trunk/contrib/server-side/fsfsfixer/fixer/
    subversion/trunk/contrib/server-side/fsfsfixer/fixer/__init__.py   (with props)
    subversion/trunk/contrib/server-side/fsfsfixer/fixer/find_good_id.py   (with props)
    subversion/trunk/contrib/server-side/fsfsfixer/fixer/fix-rev.py   (with props)

Added: subversion/trunk/contrib/server-side/fsfsfixer/README
URL: http://svn.apache.org/viewvc/subversion/trunk/contrib/server-side/fsfsfixer/README?rev=1092641&view=auto
==============================================================================
--- subversion/trunk/contrib/server-side/fsfsfixer/README (added)
+++ subversion/trunk/contrib/server-side/fsfsfixer/README Fri Apr 15 09:37:54 2011
@@ -0,0 +1,21 @@
+The set of scripts in this directory attempts to fix some kinds of
+corruption in a FSFS repository, particularly errors that are reported by
+'svnadmin verify' with the following two kinds of error message:
+
+  svnadmin: Corrupt node-revision '5-12980.0.r12980/5571'
+  svnadmin: Found malformed header in revision file
+
+  svnadmin: Corrupt representation '13001 1496 2082 16645 [...]'
+  svnadmin: Malformed representation header
+
+The files are:
+
+  fix-repo: a Bash script that calls fixer/fix-rev.py on each rev in a range
+  fixer/fix-rev.py: checks and fixes if possible a single revision
+  fixer/find_good_id.py: a helper function
+  fixer/__init__: an empty file that signals to Python that this directory
+    is a Python module
+
+For more details, see the email from Julian Foad on 2010-10-06, subject
+"Fixing FSFS 'Corrupt node-revision' and 'Corrupt representation' errors",
+<http://svn.haxx.se/dev/archive-2010-10/0095.shtml>.

Propchange: subversion/trunk/contrib/server-side/fsfsfixer/README
------------------------------------------------------------------------------
    svn:eol-style = native

Added: subversion/trunk/contrib/server-side/fsfsfixer/fix-repo
URL: http://svn.apache.org/viewvc/subversion/trunk/contrib/server-side/fsfsfixer/fix-repo?rev=1092641&view=auto
==============================================================================
--- subversion/trunk/contrib/server-side/fsfsfixer/fix-repo (added)
+++ subversion/trunk/contrib/server-side/fsfsfixer/fix-repo Fri Apr 15 09:37:54 2011
@@ -0,0 +1,21 @@
+#!/bin/bash
+USAGE="Fix some kinds of corruption in a Subversion repository
+by running './fixer/fix-rev.py' on each revision.
+Usage: $0 REPO-DIR START-REV"
+
+REPO_DIR="$1"
+START_REV="$2"
+
+if [ ! -d "$REPO_DIR" ] || [ "$START_REV" = "" ]; then
+  echo "$USAGE" >&2
+  exit 1
+fi
+
+YOUNGEST="$(svnlook youngest $REPO_DIR)"
+
+echo "Verifying revisions $START_REV through $YOUNGEST."
+
+for REV in $(seq $START_REV $YOUNGEST); do
+  echo === r$REV
+  ./fixer/fix-rev.py "$REPO_DIR" $REV
+done

Propchange: subversion/trunk/contrib/server-side/fsfsfixer/fix-repo
------------------------------------------------------------------------------
    svn:eol-style = native

Propchange: subversion/trunk/contrib/server-side/fsfsfixer/fix-repo
------------------------------------------------------------------------------
    svn:executable = *

Added: subversion/trunk/contrib/server-side/fsfsfixer/fixer/__init__.py
URL: http://svn.apache.org/viewvc/subversion/trunk/contrib/server-side/fsfsfixer/fixer/__init__.py?rev=1092641&view=auto
==============================================================================
    (empty)

Propchange: subversion/trunk/contrib/server-side/fsfsfixer/fixer/__init__.py
------------------------------------------------------------------------------
    svn:eol-style = native

Propchange: subversion/trunk/contrib/server-side/fsfsfixer/fixer/__init__.py
------------------------------------------------------------------------------
    svn:mime-type = text/x-python

Added: subversion/trunk/contrib/server-side/fsfsfixer/fixer/find_good_id.py
URL: http://svn.apache.org/viewvc/subversion/trunk/contrib/server-side/fsfsfixer/fixer/find_good_id.py?rev=1092641&view=auto
==============================================================================
--- subversion/trunk/contrib/server-side/fsfsfixer/fixer/find_good_id.py (added)
+++ subversion/trunk/contrib/server-side/fsfsfixer/fixer/find_good_id.py Fri Apr 15 09:37:54
2011
@@ -0,0 +1,108 @@
+#!/usr/bin/env python
+
+usage = """
+Print the correct FSFS node-rev id, given one that is correct except for
+its byte-offset part.
+Usage: $0 REPO-DIR FSFS-ID-WITH-BAD-OFFSET
+Example:
+  Result of running 'svnadmin verify':
+    svnadmin: Corrupt node-revision '5-12302.1-12953.r12953/29475'
+  Invocation of this script:
+    $ $0 svn-repo 5-12302.1-12953.r12953/29475
+  Output of this script:
+    5-12302.1-12953.r12953/29255
+"""
+
+import os, sys
+
+class FixError(Exception):
+  """An exception for any kind of inablility to repair the repository."""
+  pass
+
+def parse_id(id):
+  """Return the (NODEREV, REV, OFFSET) of ID, where ID is of the form
+     "NODEREV/OFFSET", and NODEREV is of the form "SOMETHING.rREV".
+  """
+  noderev, offset = id.split('/')
+  _, rev = noderev.split('.r')
+  return noderev, rev, offset
+
+def rev_file_path(repo_dir, rev):
+  return os.path.join(repo_dir, 'db', 'revs', rev)
+
+def rev_file_indexes(repo_dir, rev):
+  """Return (ids, texts), where IDS is a dictionary of all node-rev ids
+     defined in revision REV of the repo at REPO_DIR, in the form
+     {noderev: full id}, and TEXTS is an array of
+     (offset, size, expanded-size, csum [,sha1-csum, uniquifier]) tuples
+     taken from all the "text: REV ..." lines in revision REV."""
+  ids = {}
+  texts = []
+  for line in open(rev_file_path(repo_dir, rev)):
+    if line.startswith('id: '):
+      id = line.replace('id: ', '').rstrip()
+      id_noderev, id_rev, _ = parse_id(id)
+      assert id_rev == rev
+      ids[id_noderev] = id
+    if line.startswith('text: ' + rev + ' '):  # also 'props:' lines?
+      fields = line.split()
+      texts.append(tuple(fields[2:]))
+  return ids, texts
+
+def find_good_id(repo_dir, bad_id):
+  """Return the node-rev id that is like BAD_ID but has the byte-offset
+     part corrected, by looking in the revision file in the repository
+     at REPO_DIR.
+
+     ### TODO: Parsing of the rev file should skip over node-content data
+         when searching for a line matching "id: <id>", to avoid the
+         possibility of a false match.
+  """
+
+  noderev, rev, bad_offset = parse_id(bad_id)
+  ids, _ = rev_file_indexes(repo_dir, rev)
+
+  if noderev not in ids:
+    raise FixError("NodeRev Id '" + noderev + "' not found in r" + rev)
+  return ids[noderev]
+
+def find_good_rep_header(repo_dir, rev, size):
+  """Find a rep header that matches REV and SIZE.
+     Return the correct offset."""
+  _, texts = rev_file_indexes(repo_dir, rev)
+  n_matches = 0
+  for fields in texts:
+    if fields[1] == size:
+      offset = fields[0]
+      n_matches += 1
+  if n_matches != 1:
+    raise FixError("%d matches for r%s, size %s" % (n_matches, rev, size))
+  return offset
+
+
+if __name__ == '__main__':
+
+  if len(sys.argv) == 4:
+    repo_dir = sys.argv[1]
+    rev = sys.argv[2]
+    size = sys.argv[3]
+    print "Good offset:", find_good_rep_header(repo_dir, rev, size)
+    exit(0)
+
+  if len(sys.argv) != 3:
+    print >>sys.stderr, usage
+    exit(1)
+
+  repo_dir = sys.argv[1]
+  bad_id = sys.argv[2]
+
+  good_id = find_good_id(repo_dir, bad_id)
+
+  # Replacement ID must be the same length, otherwise I don't know how to
+  # reconstruct the file so as to preserve all offsets.
+  # ### TODO: This check should be in the caller rather than here.
+  if len(good_id) != len(bad_id):
+    print >>sys.stderr, "warning: the good ID has a different length: " + \
+                        "bad id '" + bad_id + "', good id '" + good_id + "'"
+
+  print good_id

Propchange: subversion/trunk/contrib/server-side/fsfsfixer/fixer/find_good_id.py
------------------------------------------------------------------------------
    svn:eol-style = native

Propchange: subversion/trunk/contrib/server-side/fsfsfixer/fixer/find_good_id.py
------------------------------------------------------------------------------
    svn:executable = *

Propchange: subversion/trunk/contrib/server-side/fsfsfixer/fixer/find_good_id.py
------------------------------------------------------------------------------
    svn:mime-type = text/x-python

Added: subversion/trunk/contrib/server-side/fsfsfixer/fixer/fix-rev.py
URL: http://svn.apache.org/viewvc/subversion/trunk/contrib/server-side/fsfsfixer/fixer/fix-rev.py?rev=1092641&view=auto
==============================================================================
--- subversion/trunk/contrib/server-side/fsfsfixer/fixer/fix-rev.py (added)
+++ subversion/trunk/contrib/server-side/fsfsfixer/fixer/fix-rev.py Fri Apr 15 09:37:54 2011
@@ -0,0 +1,240 @@
+#!/usr/bin/env python
+
+usage = """
+Fix a bad FSFS revision file.
+Usage: $0 REPO-DIR REVISION
+"""
+
+import os, sys, re, subprocess
+from subprocess import Popen, PIPE
+
+from find_good_id import FixError, rev_file_path, find_good_id, find_good_rep_header
+
+
+# ----------------------------------------------------------------------
+# Configuration
+
+# Path and file name of the 'svnadmin' and 'svnlook' programs
+SVNADMIN = 'svnadmin'
+SVNLOOK = 'svnlook'
+
+# Verbosity: True for verbose, or False for quiet
+VERBOSE = True
+
+# Global dictionaries recording the fixes made
+fixed_ids = {}
+fixed_checksums = {}
+
+
+# ----------------------------------------------------------------------
+# Functions
+
+# Print a message, only if 'verbose' mode is enabled.
+def verbose_print(str):
+  if VERBOSE:
+    print str
+
+# Echo the arguments to a log file, and also (if verbose) to standard output.
+def log(str):
+  #print >>$REPO/fix-ids.log, str
+  verbose_print(str)
+
+def run_cmd_quiet(cmd, *args):
+  retcode = subprocess.call([cmd] + list(args))
+  return retcode
+
+# Execute the command given by CMD and ARGS, and also log it.
+def run_cmd(cmd, *args):
+  log("CMD: " + cmd + ' ' + ' '.join(list(args)))
+  return run_cmd_quiet(cmd, *args)
+
+def replace_in_file(filename, old, new):
+  """Replace the string OLD with the string NEW in file FILE.
+     Replace all occurrences.  Raise an error if nothing changes."""
+
+  verbose_print("Replacing '" + old + "' in file '" + filename + "'\n" +
+                "    with  '" + new + "'")
+  # Note: we can't use '/' as a delimiter in the substitution command.
+  run_cmd('perl', '-pi.bak', '-e', "s," + old + "," + new + ",", filename)
+  if run_cmd_quiet('cmp', '--quiet', filename, filename + '.bak') == 0:
+    raise FixError("'" + filename + "' is unchanged after sed substitution.")
+  os.remove(filename + '.bak')
+
+def replace_in_rev_file(repo_dir, rev, old, new):
+  rev_file = rev_file_path(repo_dir, rev)
+  replace_in_file(rev_file, old, new)
+
+# Fix a node-rev ID that has a bad byte-offset part.  Look up the correct
+# byte-offset by using the rest of the ID, which necessarily points into an
+# older revision or the same revision.  Fix all occurrences within REV_FILE.
+#
+# ### TODO: Fix occurrences in revisions between <ID revision> and <REV>,
+#   since the error reported for <REV> might actually exist in an older
+#   revision that is referenced by <REV>.
+#
+def fix_id(repo_dir, rev, bad_id):
+
+  # Find the GOOD_ID to replace BAD_ID.
+  if bad_id == "6-12953.0.r12953/30623":
+    good_id = "0-12953.0.r12953/30403"
+  else:
+    good_id = find_good_id(repo_dir, bad_id)
+
+  # Replacement ID must be the same length, otherwise I don't know how to
+  # reconstruct the file so as to preserve all offsets.
+  if len(good_id) != len(bad_id):
+    raise FixError("Can't handle a replacement ID with a different length: " +
+                   "bad id '" + bad_id + "', good id '" + good_id + "'")
+
+  if good_id == bad_id:
+    raise FixError("The ID supplied is already correct: " +
+                   "good id '" + good_id + "'")
+
+  print "Fixing id: " + bad_id + " -> " + good_id
+  replace_in_rev_file(repo_dir, rev, bad_id, good_id)
+  fixed_ids[bad_id] = good_id
+
+def fix_checksum(repo_dir, rev, old_checksum, new_checksum):
+  """Change all occurrences of OLD_CHECKSUM to NEW_CHECKSUM in the revision
+     file for REV in REPO_DIR."""
+
+  assert len(old_checksum) and len(new_checksum)
+  assert old_checksum != new_checksum
+
+  print "Fixing checksum: " + old_checksum + " -> " + new_checksum
+  replace_in_rev_file(repo_dir, rev, old_checksum, new_checksum)
+  fixed_checksums[old_checksum] = new_checksum
+
+def fix_delta_ref(repo_dir, rev, bad_rev, bad_offset, bad_size):
+  """Fix a "DELTA <REV> <OFFSET> <SIZE>" line in the revision file for
REV
+     in REPO_DIR, where <OFFSET> is wrong."""
+  good_offset = find_good_rep_header(repo_dir, bad_rev, bad_size)
+  old_line = ' '.join(['DELTA', bad_rev, bad_offset, bad_size])
+  new_line = ' '.join(['DELTA', bad_rev, good_offset, bad_size])
+  print "Fixing delta ref:", old_line, "->", new_line
+  replace_in_rev_file(repo_dir, rev, old_line, new_line)
+
+
+def handle_one_error(repo_dir, rev, error_lines):
+  """If ERROR_LINES describes an error we know how to fix, then fix it.
+     Return True if fixed, False if not fixed."""
+
+  line1 = error_lines[0]
+  match = re.match(r"svn.*: Corrupt node-revision '(.*)'", line1)
+  if match:
+    # Fix it.
+    bad_id = match.group(1)
+    verbose_print(error_lines[0])
+    fix_id(repo_dir, rev, bad_id)
+
+    # Verify again, and expect to discover a checksum mismatch.
+    # verbose_print("Fixed an ID; now verifying to discover the checksum we need to update")
+    # error_lines = ...
+    # if error_lines[0] != "svn.*: Checksum mismatch while reading representation:":
+    #   raise FixError("expected a checksum mismatch after replacing the Id;" +
+    #                  "  instead, got this output from 'svnadmin verify -q':" +
+    #                  "//".join(error_lines))
+    #
+    # expected = ...
+    # actual   = ...
+    # fix_checksum(repo_dir, rev, expected, actual)
+
+    return True
+
+  match = re.match(r"svn.*: Checksum mismatch while reading representation:", line1)
+  if match:
+    verbose_print(error_lines[0])
+    verbose_print(error_lines[1])
+    verbose_print(error_lines[2])
+    expected = re.match(r' *expected: *([^ ]*)', error_lines[1]).group(1)
+    actual   = re.match(r' *actual: *([^ ]*)',   error_lines[2]).group(1)
+    fix_checksum(repo_dir, rev, expected, actual)
+    return True
+
+  match = re.match(r"svn.*: Corrupt representation '([0-9]*) ([0-9]*) ([0-9]*) .*'", line1)
+  if match:
+    # Extract the bad reference. We expect only 'offset' is actually bad, in
+    # the known kind of corruption that we're targetting.
+    bad_rev = match.group(1)
+    bad_offset = match.group(2)
+    bad_size = match.group(3)
+    fix_delta_ref(repo_dir, rev, bad_rev, bad_offset, bad_size)
+    return True
+
+  return False
+
+def fix_one_error(repo_dir, rev):
+  """Verify, and if there is an error we know how to fix, then fix it.
+     Return False if no error, True if fixed, exception if can't fix."""
+
+  # Capture the output of 'svnadmin verify' (ignoring any debug-build output)
+  p = Popen([SVNADMIN, 'verify', '-q', '-r'+rev, repo_dir], stdout=PIPE, stderr=PIPE)
+  _, stderr = p.communicate()
+  svnadmin_err = []
+  for line in stderr.splitlines():
+    if line.find('(apr_err=') == -1:
+      svnadmin_err.append(line)
+
+  if svnadmin_err == []:
+    return False
+
+  try:
+    if handle_one_error(repo_dir, rev, svnadmin_err):
+      return True
+  except FixError, e:
+    print 'warning:', e
+    print "Trying 'svnlook' instead."
+    pass
+
+  # At this point, we've got an 'svnadmin' error that we don't know how to
+  # handle.  Before giving up, see if 'svnlook' gives a different error,
+  # one that we *can* handle.
+
+  # Capture the output of 'svnlook tree' (ignoring any debug-build output)
+  p = Popen([SVNLOOK, 'tree', '-r'+rev, repo_dir], stdout=PIPE, stderr=PIPE)
+  _, stderr = p.communicate()
+  svnlook_err = []
+  for line in stderr.splitlines():
+    if line.find('(apr_err=') == -1:
+      svnlook_err.append(line)
+
+  if svnlook_err == []:
+    print 'warning: svnlook did not find an error'
+  else:
+    if handle_one_error(repo_dir, rev, svnlook_err):
+      return True
+
+  raise FixError("unfixable error:\n  " + "\n  ".join(svnadmin_err))
+
+
+# ----------------------------------------------------------------------
+# Main program
+
+def fix_rev(repo_dir, rev):
+  """"""
+
+  # Back up the file
+  if not os.path.exists(rev_file_path(repo_dir, rev) + '.orig'):
+    pass
+    # cp -a "$FILE" "$FILE.orig"
+
+  # Keep looking for verification errors in r$REV and fixing them while we can.
+  while fix_one_error(repo_dir, rev):
+    pass
+  print "Revision " + rev + " verifies OK."
+
+
+if __name__ == '__main__':
+
+  if len(sys.argv) != 3:
+    print >>sys.stderr, usage
+    exit(1)
+
+  repo_dir = sys.argv[1]
+  rev = sys.argv[2]
+
+  try:
+    fix_rev(repo_dir, rev)
+  except FixError, e:
+    print 'error:', e
+    exit(1)

Propchange: subversion/trunk/contrib/server-side/fsfsfixer/fixer/fix-rev.py
------------------------------------------------------------------------------
    svn:eol-style = native

Propchange: subversion/trunk/contrib/server-side/fsfsfixer/fixer/fix-rev.py
------------------------------------------------------------------------------
    svn:executable = *

Propchange: subversion/trunk/contrib/server-side/fsfsfixer/fixer/fix-rev.py
------------------------------------------------------------------------------
    svn:mime-type = text/x-python



Mime
View raw message