couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikeal Rogers <mikeal.rog...@gmail.com>
Subject Re: data recovery tool progress
Date Tue, 10 Aug 2010 01:50:26 GMT
I pulled down the latest code from Adam's branch @
7080ff72baa329cf6c4be2a79e71a41f744ed93b.

Running timer:tc(couch_db_repair, make_lost_and_found, ["multi_conflict"]).
on a database with 200 lost updates spanning 200 restarts (
http://github.com/mikeal/couchtest/blob/master/multi_conflict.couch ) took
about 101 seconds.

I tried running against a larger databases (
http://github.com/mikeal/couchtest/blob/master/testwritesdb.couch ) and I
got this exception:

http://gist.github.com/516491

-Mikeal



On Mon, Aug 9, 2010 at 6:09 PM, Randall Leeds <randall.leeds@gmail.com>wrote:

> Summing up what went on in IRC for those who were absent.
>
> The latest progress is on Adam's branch at
> http://github.com/kocolosk/couchdb/tree/db_repair
>
> couch_db_repair:make_lost_and_found/1 attempts to create a new
> lost+found/DbName database to which it merges all nodes not accessible
> from anywhere (any other node found in a full file scan or any header
> pointers).
>
> Currently, make_lost_and_found uses Volker's repair (from
> couch_db_repair_b module, also in Adam's branch).
> Adam found that the bottleneck was couch_file calls and that the
> repair process was taking a very long time so he added
> couch_db_repair:find_nodes_quickly/1 that reads 1MB chunks as binary
> and tries to process it to find nodes instead of scanning back one
> byte at a time. It is currently not hooked up to the repair mechanism.
>
> Making progress. Go team.
>
> On Mon, Aug 9, 2010 at 13:52, Mikeal Rogers <mikeal.rogers@gmail.com>
> wrote:
> > jchris suggested on IRC that I try a normal doc update and see if that
> fixes
> > it.
> >
> > It does. After a new doc was created the dbinfo doc count was back to
> > normal.
> >
> > -Mikeal
> >
> > On Mon, Aug 9, 2010 at 1:39 PM, Mikeal Rogers <mikeal.rogers@gmail.com
> >wrote:
> >
> >> Ok, I pulled down this code and tested against a database with a ton of
> >> missing writes right before a single restart.
> >>
> >> Before restart this was the database:
> >>
> >>   {
> >>     db_name: "testwritesdb"
> >>     doc_count: 124969
> >>     doc_del_count: 0
> >>     update_seq: 124969
> >>     purge_seq: 0
> >>     compact_running: false
> >>     disk_size: 54857478
> >>     instance_start_time: "1281384140058211"
> >>     disk_format_version: 5
> >>   }
> >>
> >> After restart it was this:
> >>
> >>   {
> >>     db_name: "testwritesdb"
> >>     doc_count: 1
> >>     doc_del_count: 0
> >>     update_seq: 1
> >>     purge_seq: 0
> >>     compact_running: false
> >>     disk_size: 54857478
> >>     instance_start_time: "1281384593876026"
> >>     disk_format_version: 5
> >>   }
> >>
> >> After repair, it's this:
> >>
> >> {
> >>   db_name: "testwritesdb"
> >>   doc_count: 1
> >>   doc_del_count: 0
> >>   update_seq: 124969
> >>   purge_seq: 0
> >>   compact_running: false
> >>   disk_size: 54857820
> >>   instance_start_time: "1281385990193289"
> >>   disk_format_version: 5
> >>   committed_update_seq: 124969
> >> }
> >>
> >> All the sequences are there and hitting _all_docs shows all the
> documents
> >> so why is the doc_count only 1 in the dbinfo?
> >>
> >> -Mikeal
> >>
> >> On Mon, Aug 9, 2010 at 11:53 AM, Filipe David Manana <
> fdmanana@apache.org>wrote:
> >>
> >>> For the record (and people not on IRC), the code at:
> >>>
> >>> http://github.com/fdmanana/couchdb/commits/db_repair
> >>>
> >>> is working for at least simple cases. Use
> >>> couch_db_repair:repair(DbNameAsString).
> >>> There's one TODO:  update the reduce values for the by_seq and by_id
> >>> BTrees.
> >>>
> >>> If anyone wants to give some help on this, your welcome.
> >>>
> >>> On Mon, Aug 9, 2010 at 6:12 PM, Mikeal Rogers <mikeal.rogers@gmail.com
> >>> >wrote:
> >>>
> >>> > I'm starting to create a bunch of test db files that expose this bug
> >>> under
> >>> > different conditions like multiple restarts, across compaction,
> >>> variances
> >>> > in
> >>> > updates the might cause conflict, etc.
> >>> >
> >>> > http://github.com/mikeal/couchtest
> >>> >
> >>> > The README outlines what was done to the db's and what needs to be
> >>> > recovered.
> >>> >
> >>> > -Mikeal
> >>> >
> >>> > On Mon, Aug 9, 2010 at 9:33 AM, Filipe David Manana <
> >>> fdmanana@apache.org
> >>> > >wrote:
> >>> >
> >>> > > On Mon, Aug 9, 2010 at 5:22 PM, Robert Newson <
> >>> robert.newson@gmail.com
> >>> > > >wrote:
> >>> > >
> >>> > > > Doesn't this bit;
> >>> > > >
> >>> > > > -        Db#db{waiting_delayed_commit=nil};
> >>> > > > +        Db;
> >>> > > > +        % Db#db{waiting_delayed_commit=nil};
> >>> > > >
> >>> > > > revert the bug fix?
> >>> > > >
> >>> > >
> >>> > > That's intentional, for my local testing.
> >>> > > That patch isn't obviously anything close to final, it's too
> >>> experimental
> >>> > > yet.
> >>> > >
> >>> > > >
> >>> > > > B.
> >>> > > >
> >>> > > > On Mon, Aug 9, 2010 at 5:09 PM, Jan Lehnardt <jan@apache.org>
> >>> wrote:
> >>> > > > > Hi All,
> >>> > > > >
> >>> > > > > Filipe jumped in to start working on the recovery tool,
but he
> >>> isn't
> >>> > > done
> >>> > > > yet.
> >>> > > > >
> >>> > > > > Here's the current patch:
> >>> > > > >
> >>> > > > > http://www.friendpaste.com/4uMngrym4r7Zz4R0ThSHbz
> >>> > > > >
> >>> > > > > it is not done and very early, but any help on this
is greatly
> >>> > > > appreciated.
> >>> > > > >
> >>> > > > > The current state is (in Filipe's words):
> >>> > > > >  - i can detect that a file needs repair
> >>> > > > >  - and get the last btree roots from it
> >>> > > > >  - "only" missing: get last db seq num
> >>> > > > >  - write new header
> >>> > > > >  - and deal with the local docs btree (if exists)
> >>> > > > >
> >>> > > > > Thanks!
> >>> > > > > Jan
> >>> > > > > --
> >>> > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> > >
> >>> > >
> >>> > > --
> >>> > > Filipe David Manana,
> >>> > > fdmanana@apache.org
> >>> > >
> >>> > > "Reasonable men adapt themselves to the world.
> >>> > >  Unreasonable men adapt the world to themselves.
> >>> > >  That's why all progress depends on unreasonable men."
> >>> > >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Filipe David Manana,
> >>> fdmanana@apache.org
> >>>
> >>> "Reasonable men adapt themselves to the world.
> >>>  Unreasonable men adapt the world to themselves.
> >>>  That's why all progress depends on unreasonable men."
> >>>
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message