couchdb-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Couchdb Wiki] Update of "ReleaseNotice1.0.0RepairTool" by JanLehnardt
Date Sun, 26 Feb 2012 22:07:41 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Couchdb Wiki" for change notification.

The "ReleaseNotice1.0.0RepairTool" page has been changed by JanLehnardt:
http://wiki.apache.org/couchdb/ReleaseNotice1.0.0RepairTool?action=diff&rev1=1&rev2=2

Comment:
add repair tool page

   5. Start CouchDB 1.0.1.
   6. Wait until CouchDB says "Time to Relax.".
    6.1. CouchDB will have switched te config to `recovery = false`.
-   6.2. CouchDB will 
+   6.2. CouchDB will have created  `/lost\_and\_found\_$dbname\_$timestamp` for every db
that has orphaned data.
+  7. Go to `/lost\_and\_found\_$dbname\_$timestamp` and see if any lost data is found.
+  8. Replicate `/lost\_and\_found\_$dbname\_$timestamp` to `/$dbname` to restore all data.
  
+ 
+ 
+ = A generalized repair algorithm =
+ 
+ 
+ == the basic repair algorithm: ==
+ 
+     for Node in BTreeNodes
+       run the compactor to create a lost and found database file that treats Node as a root
+     use the replicator to merge all the lost and found database files into one database
+     Done!
+ 
+ It is an optimization to prune the Nodes in BTreeNodes so that there is the minumum number
of them without losing any data. If we did this for every single node in the original file,
we'd end up with the same final result as if we did it optimally. the difference is that we'd
use exponentially more disk space in the process.
+ 
+ [Another space saving optimization would be to only keep one resident lost+found db around
and have one active l+f db one per compaction and replicate the current one into the resident
one after each compaction. I don't know how hard it'd be to find the minimal set of node to
apply the other optimization, so this might be a reasonable alternative to keep disk-usage
in bounds, although not as optimal -- Jan]
+ 
+ == Finding only the btree nodes we need: ==
+ 
+ It is hard to tell roots apart from other nodes. The way to do that is work backwards through
the file keeping track of pointers in nodes. if a node matches a pointer you've seen, you
don't need to capture it. if a node has not been pointed to, capture it. This gives you all
the roots.
+ 
+ 
+ This means we can also keep track of the pointers that headers point to, and a root is pointed
to by any header, we can assume any other roots later in the file from that header, take that
state into account. therefore any root pointed to be a header can be disregarded.
+ 
+ when we are done, we will have a list of roots that are not reflected by the last valid
header in the file (some of these may be after the last valid header), **nor are they reflected
by later roots in the same "lineage"**. this is the set of nodes that we need to use as the
starting point for compactions.
+ 
+ **the part in bold is false and means we will collect unneeded roots** I'm currently looking
into a way to distinguish superceded roots from final roots.
+ 
+ we will end up with as many "lost and found" database files as we have roots here. once
we have them, we can begin the compact and replicate routine.
+ 

Mime
View raw message